blacksheep package¶

blacksheep.make_outliers_table(df: pandas.core.frame.DataFrame, iqrs: float = 1.5, up_or_down: str = 'up', aggregate: bool = True, save_outlier_table: bool = False, save_frac_table: bool = False, output_prefix: str = 'outliers', ind_sep: str = '-') → blacksheep.classes.OutlierTable[source]¶

Converts a DataFrame of values into an OutlierTable object, which includes a DataFrame of outlier and non-outlier count values.

Parameters

df – Input DataFrame with samples as columns and sites/genes as columns.
iqrs – The number of inter-quartile ranges (IQRs) above or below the median to consider a value as an outlier.
up_or_down – Whether to call up or down outliers. Up is above the median; down is below the median. Options “up” or “down”.
aggregate – Whether to sum outliers across a grouping (e.g. gene-level) than individual sites. For instance if columns indicate phosphosites on proteins, with the format “RAG2-S365”, output will show counts of outliers per protein (e.g. RAG2) rather than on individual sites (e.g. RAG2-S365).
save_outlier_table – Whether to write a file with the outlier count table.
save_frac_table – Whether to write a file with the outlier fraction table.
output_prefix – If files are written, a prefix for the files.
ind_sep – The separator used in sites, for instance, to separate a gene and site. If just using genes (i.e. no separator), or not aggregating this parameter has no effect.

Returns: outliers: Returns an OutlierTable object, with outlier and non-outlier counts and metadata about how the outliers were called.

blacksheep.compare_groups_outliers(outliers: blacksheep.classes.OutlierTable, annotations: pandas.core.frame.DataFrame, frac_filter: Optional[float] = 0.3, save_qvalues: bool = False, output_prefix: str = 'outliers', save_comparison_summaries: bool = False) → blacksheep.classes.qValues[source]¶

Takes an OutlierTable object and a sample annotation DataFrame and performs comparisons for any column in annotations with exactly 2 groups. For each group identified in the annotations DataFrame, this function will calculate the q-values of enrichment of outliers for each row in each group.

Parameters

outliers – An OutlierTable, with a DataFrame of outlier and non-outlier counts, as well as parameters for how outliers were calculated.
annotations – A DataFrame with samples as rows and annotations as columns. Each column must contain exactly 2 different categories, not counting missing values. Columns without 2 options will be ignored.
frac_filter – The fraction of samples in the group of interest that must have an outlier value to be considered in the comparison. Float between 0 and 1 or None.
save_qvalues – Whether to write a file with a table of qvalues.
output_prefix – If files are written, a prefix for the files.
save_comparison_summaries – Whether to write a file for each annotation column with the counts in the fisher table, pvalues and q values per row.

Returns: qvals: A qValues object, which includes a DataFrame of q-values for each comparison, as well as some metadata about how the comparisons were performed.

blacksheep.deva(df: pandas.core.frame.DataFrame, annotations: pandas.core.frame.DataFrame, iqrs: float = 1.5, up_or_down: str = 'up', aggregate: bool = True, save_outlier_table: bool = False, save_frac_table: bool = False, frac_filter: Optional[float] = 0.3, save_qvalues: bool = False, output_prefix: str = 'outliers', ind_sep: str = '-', save_comparison_summaries: bool = False) → Tuple[blacksheep.classes.OutlierTable, blacksheep.classes.qValues][source]¶

Takes a DataFrame of values and returns OutlierTable and qValues objects. This command runs the whole outliers pipeline. The DataFrame in the OutlierTable object can be used to run more comparisons in future. The qValues object can be used for visualization, or writing significant gene lists.

Parameters

df – Input DataFrame with samples as columns and sites/genes as rows.
annotations – A DataFrame with samples as rows and annotations as columns. Each column must contain exactly 2 different values, not counting missing values. Other columns will be ignored.
iqrs – The number of interquartile ranges (IQRs) above or below the median to consider a value as an outlier.
up_or_down – Whether to call up or down outliers. Up is above the median; down is below the median. Options “up” or “down”.
aggregate – Whether to sum outliers across a grouping (e.g. gene-level) than individual sites. For instance if columns indicate phosphosites on proteins, with the format “RAG2-S365”, output will show counts of outliers per protein (e.g. RAG2) rather than on individual sites (e.g. RAG2-S365).
save_outlier_table – Whether to write a file with the outlier count table.
save_frac_table – Whether to write a file of the fraction of outliers.
frac_filter – The fraction of samples in the group of interest that must have an outlier value to be considered in the comparison. Float between 0 and 1 or None.
save_qvalues – Whether to write a file of qvalues.
output_prefix – If files are written, a prefix for the files.
ind_sep – The separator used in the columns, for instance, to separate a gene and site. If just using genes (i.e. no separator), or not aggregating this parameter has no effect.
save_comparison_summaries – Whether to write a table for each comparison with the counts in the fisher table, pvalues and qvalues per row.

Returns: outliers, qvals: Returns an OutlierTable object and qValues object.

blacksheep.plot_heatmap(annotations: pandas.core.frame.DataFrame, qvals: pandas.core.frame.DataFrame, col_of_interest: str, vis_table: pandas.core.frame.DataFrame, fdr: float = 0.05, red_or_blue: str = 'red', output_prefix: str = 'outliers', colors: Optional[str] = None, savefig: bool = False) → list[source]¶

Plots a heatmap of significantly enriched values for a given comparison.

Parameters

annotations – Annotations DataFrame, samples as rows, annotations as columns
qvals – qvalues DataFrame with genes/sites as rows and comparisons as columns
col_of_interest – Which column from qvalues should be used to find signficant genes
vis_table – Table to be visualized in heatmap. Index values should correspond to the annotation df index, column names should correspond to qvals df index
fdr – FDR threshold to for significance
red_or_blue – Whether heatmap should be in red or blue color scale
output_prefix – If saving files, output prefix
colors – File to find color map for annotation header
savefig – Whether to save the plot to a pdf

Returns: [annot_ax, vals_ax, cbar_ax, leg_ax]: List of matplotlib axs, can be further customized before saving. In order the axes contain: annotation header, the heatmap, the color bar, and the legend.

blacksheep.run_simulations(infile, ind_sep, thresh, reps, outfile, genes, pval)[source]¶

blacksheep.binarize_annotations(df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source]¶

Takes an annotation DataFrame, checks each column for the number of possible values, and adjusts based on that. If the column has 0 or 1 options, it is dropped. Cols with 2 possible values are retained as-is. Cols with more than 2 values are expanded. For each value in that column, a new column is created with val and not_val options.

Parameters: df – Annotations DataFrame.

Returns: new_df: Refactored annotations DataFrame.

blacksheep.normalize(df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source]¶

Performs median of ratios normalization on a given dataframe, then a log2 transform.

Parameters: df – Unnormalized values dataframe

Returns: Normalized dataframe

blacksheep.read_in_values(path: str) → pandas.core.frame.DataFrame[source]¶

Figures out sep and parsing file into dataframe.

Parameters: path – File path

Returns: df: DataFrame from table in file

blacksheep.read_in_outliers(path: str, updown: str, iqrs: float) → blacksheep.classes.OutlierTable[source]¶

Parses a file into an OutlierTable object.

Parameters

path – File path
updown – Whether the outliers represent up or down outliers
iqrs – How many IQRs were used to define an outlier

Returns: outliers: OutlierTable object

class blacksheep.qValues(df: pandas.core.frame.DataFrame, comps: list, frac_filter: Optional[float])[source]¶

Bases: object

Output from comparing groups using outliers.

make_signed_logqs() → pandas.core.frame.DataFrame[source]¶

Create a DataFrame with signed log10 qvalues for each comparison. E.g. group1 qvalues will be positive, and group 2 qvalues will be negative. Assignment of positive group is based on order in qvalues, could be helpful to negate some columns in output depending on group of interest.

Returns: DataFrame with signed qvalues.

write_gene_lists(fdr_cut_off: float = 0.01, output_prefix: str = 'outliers', comparisons: Optional[List] = None)[source]¶

Writes significant gene list files for every column in a qvalue table

Parameters

fdr_cut_off – FDR threshold for significance
output_prefix – Output prefix for files
comparisons – which subset of qvalue columns to write gene lists for. Default will
for all columns (write) –

Returns: None

class blacksheep.OutlierTable(df: pandas.core.frame.DataFrame, updown: str, iqrs: Optional[float], samples: Optional[list], frac_table: Optional[pandas.core.frame.DataFrame])[source]¶

Bases: object

Output of calling outliers.

Submodules¶

blacksheep.catheat module¶

blacksheep.catheat.heatmap(data, cmap: Optional[dict] = None, palette: str = 'hls', ax=None, legend: bool = True, leg_pos: str = 'right', leg_ax=None, leg_kws: Optional[dict] = None, **sns_kws)[source]¶

Class to plot categorical heatmap using seaborn.

Parameters

data (rectangular dataset) – 2D dataset that can be coerced into an ndarray. If a Pandas DataFrame is provided, the index/column information will be used to label the columns and rows.
cmap (dict, optional) – Colors for each category in the dataset. Missing colors will be added from the palette.
palette (matplotlib/seaborn color palette name or object, optional) – Palette to be used for heatmap.
ax (matplotlib ax, optional) –
legend (bool, optional) – If True, plot legend.
leg_ax (matplotlib axis, optional) – By default, will add legend to same ax as heatmap. Use this argument to explicitly set legend ax.
leg_pos ({'right', 'top'}) – Position of legend. Only relevant if legend ax is not explicitly provided via leg_ax.
leg_kws (dict, optional) – Keyword arguments passed to plt.legend()
**sns_kws – Keyword argumentas passed through to seaborn.heatmap()

Returns

matplotlib axis – Heatmap axis as returned by seaborn.heatmap().
colormap (dict) – Colormap mapping categorical values to RGB colours.

blacksheep.classes module¶

class blacksheep.classes.OutlierTable(df: pandas.core.frame.DataFrame, updown: str, iqrs: Optional[float], samples: Optional[list], frac_table: Optional[pandas.core.frame.DataFrame])[source]¶

Bases: object

Output of calling outliers.

blacksheep.classes.list_to_file(lis: Iterable, filename: str)[source]¶

Takes an iterable and a file path and writes a value per line from the iterable into the new file.

Parameters

lis – Iterable to write to file
filename – Filename to write to.

Returns

None

blacksheep.classes.make_frac_table(df, samples)[source]¶

Constructs the fraction table from the outliers table

Returns: A DataFrame with one column per sample, with the fraction of outliers per row per sample. This table is useful for visualization but not statistics.

class blacksheep.classes.qValues(df: pandas.core.frame.DataFrame, comps: list, frac_filter: Optional[float])[source]¶

Bases: object

Output from comparing groups using outliers.

make_signed_logqs() → pandas.core.frame.DataFrame[source]¶

Create a DataFrame with signed log10 qvalues for each comparison. E.g. group1 qvalues will be positive, and group 2 qvalues will be negative. Assignment of positive group is based on order in qvalues, could be helpful to negate some columns in output depending on group of interest.

Returns: DataFrame with signed qvalues.

write_gene_lists(fdr_cut_off: float = 0.01, output_prefix: str = 'outliers', comparisons: Optional[List] = None)[source]¶

Writes significant gene list files for every column in a qvalue table

Parameters

fdr_cut_off – FDR threshold for significance
output_prefix – Output prefix for files
comparisons – which subset of qvalue columns to write gene lists for. Default will
for all columns (write) –

Returns: None

blacksheep.cli module¶

blacksheep.comparisons module¶

blacksheep.comparisons.get_sample_lists(annotations: pandas.core.frame.DataFrame, col: str) → Tuple[Optional[str], Optional[List[str]], Optional[str], Optional[List[str]]][source]¶

Finds groupings of samples from an annotation DataFrame column.

Parameters

annotations – A DataFrame with samples as the index and annotations as columns. Each
must contain exactly 2 different values, and optionally missing values. Columns (column) –
less or more than 2 options will be ignored. (with) –
col – Which column for which to define groups.

Returns: A label for group0, the list of samples in group0, a label for group1 and the list: of samples in group1.

blacksheep.deva module¶

blacksheep.deva.compare_groups_outliers(outliers: blacksheep.classes.OutlierTable, annotations: pandas.core.frame.DataFrame, frac_filter: Optional[float] = 0.3, save_qvalues: bool = False, output_prefix: str = 'outliers', save_comparison_summaries: bool = False) → blacksheep.classes.qValues[source]¶

Takes an OutlierTable object and a sample annotation DataFrame and performs comparisons for any column in annotations with exactly 2 groups. For each group identified in the annotations DataFrame, this function will calculate the q-values of enrichment of outliers for each row in each group.

Parameters

outliers – An OutlierTable, with a DataFrame of outlier and non-outlier counts, as well as parameters for how outliers were calculated.
annotations – A DataFrame with samples as rows and annotations as columns. Each column must contain exactly 2 different categories, not counting missing values. Columns without 2 options will be ignored.
frac_filter – The fraction of samples in the group of interest that must have an outlier value to be considered in the comparison. Float between 0 and 1 or None.
save_qvalues – Whether to write a file with a table of qvalues.
output_prefix – If files are written, a prefix for the files.
save_comparison_summaries – Whether to write a file for each annotation column with the counts in the fisher table, pvalues and q values per row.

Returns: qvals: A qValues object, which includes a DataFrame of q-values for each comparison, as well as some metadata about how the comparisons were performed.

blacksheep.deva.deva(df: pandas.core.frame.DataFrame, annotations: pandas.core.frame.DataFrame, iqrs: float = 1.5, up_or_down: str = 'up', aggregate: bool = True, save_outlier_table: bool = False, save_frac_table: bool = False, frac_filter: Optional[float] = 0.3, save_qvalues: bool = False, output_prefix: str = 'outliers', ind_sep: str = '-', save_comparison_summaries: bool = False) → Tuple[blacksheep.classes.OutlierTable, blacksheep.classes.qValues][source]¶

Takes a DataFrame of values and returns OutlierTable and qValues objects. This command runs the whole outliers pipeline. The DataFrame in the OutlierTable object can be used to run more comparisons in future. The qValues object can be used for visualization, or writing significant gene lists.

Parameters

df – Input DataFrame with samples as columns and sites/genes as rows.
annotations – A DataFrame with samples as rows and annotations as columns. Each column must contain exactly 2 different values, not counting missing values. Other columns will be ignored.
iqrs – The number of interquartile ranges (IQRs) above or below the median to consider a value as an outlier.
up_or_down – Whether to call up or down outliers. Up is above the median; down is below the median. Options “up” or “down”.
aggregate – Whether to sum outliers across a grouping (e.g. gene-level) than individual sites. For instance if columns indicate phosphosites on proteins, with the format “RAG2-S365”, output will show counts of outliers per protein (e.g. RAG2) rather than on individual sites (e.g. RAG2-S365).
save_outlier_table – Whether to write a file with the outlier count table.
save_frac_table – Whether to write a file of the fraction of outliers.
frac_filter – The fraction of samples in the group of interest that must have an outlier value to be considered in the comparison. Float between 0 and 1 or None.
save_qvalues – Whether to write a file of qvalues.
output_prefix – If files are written, a prefix for the files.
ind_sep – The separator used in the columns, for instance, to separate a gene and site. If just using genes (i.e. no separator), or not aggregating this parameter has no effect.
save_comparison_summaries – Whether to write a table for each comparison with the counts in the fisher table, pvalues and qvalues per row.

Returns: outliers, qvals: Returns an OutlierTable object and qValues object.

blacksheep.deva.make_outliers_table(df: pandas.core.frame.DataFrame, iqrs: float = 1.5, up_or_down: str = 'up', aggregate: bool = True, save_outlier_table: bool = False, save_frac_table: bool = False, output_prefix: str = 'outliers', ind_sep: str = '-') → blacksheep.classes.OutlierTable[source]¶

Converts a DataFrame of values into an OutlierTable object, which includes a DataFrame of outlier and non-outlier count values.

Parameters

df – Input DataFrame with samples as columns and sites/genes as columns.
iqrs – The number of inter-quartile ranges (IQRs) above or below the median to consider a value as an outlier.
up_or_down – Whether to call up or down outliers. Up is above the median; down is below the median. Options “up” or “down”.
aggregate – Whether to sum outliers across a grouping (e.g. gene-level) than individual sites. For instance if columns indicate phosphosites on proteins, with the format “RAG2-S365”, output will show counts of outliers per protein (e.g. RAG2) rather than on individual sites (e.g. RAG2-S365).
save_outlier_table – Whether to write a file with the outlier count table.
save_frac_table – Whether to write a file with the outlier fraction table.
output_prefix – If files are written, a prefix for the files.
ind_sep – The separator used in sites, for instance, to separate a gene and site. If just using genes (i.e. no separator), or not aggregating this parameter has no effect.

Returns: outliers: Returns an OutlierTable object, with outlier and non-outlier counts and metadata about how the outliers were called.

blacksheep.parsers module¶

blacksheep.parsers.binarize_annotations(df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source]¶

Takes an annotation DataFrame, checks each column for the number of possible values, and adjusts based on that. If the column has 0 or 1 options, it is dropped. Cols with 2 possible values are retained as-is. Cols with more than 2 values are expanded. For each value in that column, a new column is created with val and not_val options.

Parameters: df – Annotations DataFrame.

Returns: new_df: Refactored annotations DataFrame.

blacksheep.parsers.normalize(df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source]¶

Performs median of ratios normalization on a given dataframe, then a log2 transform.

Parameters: df – Unnormalized values dataframe

Returns: Normalized dataframe

blacksheep.parsers.read_in_outliers(path: str, updown: str, iqrs: float) → blacksheep.classes.OutlierTable[source]¶

Parses a file into an OutlierTable object.

Parameters

path – File path
updown – Whether the outliers represent up or down outliers
iqrs – How many IQRs were used to define an outlier

Returns: outliers: OutlierTable object

blacksheep.parsers.read_in_values(path: str) → pandas.core.frame.DataFrame[source]¶

Figures out sep and parsing file into dataframe.

Parameters: path – File path

Returns: df: DataFrame from table in file

blacksheep.parsers.subset_by_genes(outliers: pandas.core.frame.DataFrame, ind_list: Iterable[str], ind_sep: str = None) → pandas.core.frame.DataFrame[source]¶

blacksheep.visualization module¶

blacksheep.visualization.plot_heatmap(annotations: pandas.core.frame.DataFrame, qvals: pandas.core.frame.DataFrame, col_of_interest: str, vis_table: pandas.core.frame.DataFrame, fdr: float = 0.05, red_or_blue: str = 'red', output_prefix: str = 'outliers', colors: Optional[str] = None, savefig: bool = False) → list[source]¶

Plots a heatmap of significantly enriched values for a given comparison.

Parameters

annotations – Annotations DataFrame, samples as rows, annotations as columns
qvals – qvalues DataFrame with genes/sites as rows and comparisons as columns
col_of_interest – Which column from qvalues should be used to find signficant genes
vis_table – Table to be visualized in heatmap. Index values should correspond to the annotation df index, column names should correspond to qvals df index
fdr – FDR threshold to for significance
red_or_blue – Whether heatmap should be in red or blue color scale
output_prefix – If saving files, output prefix
colors – File to find color map for annotation header
savefig – Whether to save the plot to a pdf

Returns: [annot_ax, vals_ax, cbar_ax, leg_ax]: List of matplotlib axs, can be further customized before saving. In order the axes contain: annotation header, the heatmap, the color bar, and the legend.