Indices and tables

Command Line Interface

usage: blacksheep [-h] [--version]

Positional Arguments


Possible choices: normalize, outliers_table, binarize, compare_groups, visualize, deva, simulations

Named Arguments

--version, -v

show program’s version number and exit



Takes an unnormalized values table and uses median of ratios normalization to normalize. Saves a log2 normalized table appropriate for BlackSheep analysis.

blacksheep normalize [-h] [--output_prefix OUTPUT_PREFIX] unnormed_values

Positional Arguments


Table of values to be normalized. Sites/genes as rows, samples as columns.

Named Arguments


Prefix for output file. Suffix will be ‘.normalized.tsv’

Default: “values”


Takes a table of values and converts to a table of outlier counts.

blacksheep outliers_table [-h] [--output_prefix OUTPUT_PREFIX] [--iqrs IQRS]
                          [--up_or_down {up,down}] [--ind_sep IND_SEP]
                          [--do_not_aggregate] [--write_frac_table]

Positional Arguments


File path to input values. Columns must be samples, genes must be sites or genes. Only .tsv and .csv accepted.

Named Arguments


Output prefix for writing files. Default outliers.

Default: outliers


Number of interquartile ranges (IQRs) above or below the median to consider a value an outlier. Default is 1.5 IQRs.

Default: 1.5


Possible choices: up, down

Whether to look for up or down outliers. Choices are up or down. Default up.

Default: “true”


If site labels have a parent molecule (e.g. a gene name such as ATM) and a site identifier (e.g. S365) this is the delimiter between the two elements. Default is -

Default: “-“


Use flag if you do not want to sum outliers based on site prefixes.

Default: False


Use flag if you want to write a table with fraction of values per site, per sample that are outliers. Will not be written by default. Useful for visualization.

Default: False


Takes an annotation table where some columns may have more than 2 possible values (not including empty/null values) and outputs an annotation table with only two values per annotation. Propagates null values.

blacksheep binarize [-h] [--output_prefix OUTPUT_PREFIX] annotations

Positional Arguments


Annotation table with samples as rows and annotation labels as columns.

Named Arguments


Output prefix for writing files. Default annotations. Suffix will be ‘.binarized.tsv’

Default: annotations


Takes an annotation table and outlier count table (output of outliers_table) and outputs qvalues from a statistical test that looks for enrichment of outlier values in each group in the annotation table. For each value in each comparison, the qvalue table will have 1 column, if there are any genes in that comparison.

blacksheep compare_groups [-h] [--ind_subset IND_SUBSET] [--ind_sep IND_SEP]
                          [--output_prefix OUTPUT_PREFIX]
                          [--frac_filter FRAC_FILTER]
                          [--write_comparison_summaries] [--iqrs IQRS]
                          [--up_or_down {up,down}] [--write_gene_list]
                          [--make_heatmaps] [--fdr FDR]
                          [--red_or_blue {red,blue}]
                          [--annotation_colors ANNOTATION_COLORS]
                          outliers_table annotations

Positional Arguments


Table of outlier counts (output of outliers_table). Must be .tsv or .csv file, with outlier and non-outlier counts as columns, and genes/sites as rows.


Table of annotations. Must be .csv or .tsv. Samples as rows and comparisons as columns. Comparisons must have only unique values (not including missing values). If there are more options than that, you can use binarize to prepare the table.

Named Arguments


File with subset of indexes to consider in comparison


Index separator for subsetting genes. Only needed if using ind_subset, and if rows of outliers are NOT aggregated.


Output prefix for writing files. Default outliers.

Default: outliers


The minimum fraction of samples per group that must have an outlier in a gene toconsider that gene in the analysis. This is used to prevent a high number of outlier values in 1 sample from driving a low qvalue. Default 0.3

Default: 0.3


Use flag to write a separate file for each column in the annotations table, with outlier counts in each group, p-values and q-values in each group.

Default: False


Number of IQRs used to define outliers in the input count table. Optional.


Possible choices: up, down

Whether input outlier table represents up or down outliers. Needed for output file labels. Default up


Use flag to write a list of significantly enriched genes for each value in each comparison. If used, need an fdr threshold as well.

Default: False


Use flag to draw a heatmap of signficantly enriched genes for each value in each comparison. If used, need an fdr threshold as well.

Default: False


FDR cut off to use for signficantly enriched gene lists and heatmaps. Default 0.05

Default: 0.05


Possible choices: red, blue

If –make_heatmaps is called, color of values to draw on heatmap. Default red.

Default: “red”


File with color map to use for annotation header if –make_heatmaps is used. Must have a ‘value color’ format for each value in annotations. Any value not represented will be assigned a new color.


Used to make custom heatmaps from significant genes.

blacksheep visualize [-h] [--output_prefix OUTPUT_PREFIX]
                     [--annotations_to_show ANNOTATIONS_TO_SHOW [ANNOTATIONS_TO_SHOW ...]]
                     [--fdr FDR] [--red_or_blue {red,blue}]
                     [--annotation_colors ANNOTATION_COLORS]
                     comparison_qvalues annotations visualization_table

Positional Arguments


Table of qvalues, output from compare_groups. Must be .csv or .tsv. Has genes/sites as rows and comparison values as columns.


Table of annotations used to generate qvalues.


Values to visualize in heatmap. Samples as columns and genes/sites as rows. Using outlier fraction table is recommended, but original values can also be used if no aggregation was used.


Name of column in qvalues table from which to visualize significant genes.

Named Arguments


Output prefix for writing files. Default outliers.

Default: outliers


Names of columns from the annotation table to show in the header of the heatmap. Default is all columns.


FDR threshold to use to select genes to visualize. Default 0.05

Default: 0.05


Possible choices: red, blue

Color of values to draw on heatmap. Default red.

Default: “red”


File with color map to use for annotation header. Must have a line with ‘value color’ format for each value in annotations. Any value not represented will be assigned a new color.


Use flag to write a list of significantly enriched genes for each value in each comparison.

Default: False


Runs whole outliers pipeline. Has options to output every possible output.

blacksheep deva [-h] [--output_prefix OUTPUT_PREFIX] [--iqrs IQRS]
                [--up_or_down {up,down}] [--do_not_aggregate]
                [--write_outlier_table] [--write_frac_table]
                [--ind_sep IND_SEP] [--frac_filter FRAC_FILTER]
                [--write_comparison_summaries] [--fdr FDR] [--write_gene_list]
                [--make_heatmaps] [--red_or_blue {red,blue}]
                [--annotation_colors ANNOTATION_COLORS]
                values annotations

Positional Arguments


File path to input values. Samples are columns and genes/sites are rows. Only .tsv and .csv accepted.


File path to annotation values. Rows are sample names, header is different annotations. e.g. mutation status.

Named Arguments


Output prefix for writing files. Default outliers.

Default: outliers


Number of inter-quartile ranges (IQRs) above or below the median to consider a value an outlier. Default is 1.5.

Default: 1.5


Possible choices: up, down

Whether to look for up or down outliers. Choices are up or down. Default up.

Default: “true”


Use flag if you do not want to sum outliers based on site prefixes.

Default: False


Use flag to write a table of outlier counts.

Default: False


Use flag if you want to write a table with fraction of values per site per sample that are outliers. Useful for custom visualization.

Default: False


If site labels have a parent molecule (e.g. a gene name such as ATM) and a site identifier (e.g. S365) this is the delimiter between the two elements. Default is -

Default: “-“


The minimum fraction of samples per group that must have an outlier in a gene toconsider that gene in the analysis. This is used to prevent a high number of outlier values in 1 sample from driving a low qvalue. Default 0.3

Default: 0.3


Use flag to write a separate file for each column in the annotations table, with outlier counts in each group, p-values and q-values in each group.

Default: False


FDR threshold to use to select genes to visualize. Default 0.05

Default: 0.05


Use flag to write a list of significantly enriched genes for each value in each comparison.

Default: False


Use flag to draw a heatmap of significantly enriched genes for each value in each comparison. If used, need an fdr threshold as well.

Default: False


Possible choices: red, blue

Color of values to draw on heatmap. Default red.

Default: “red”


File with color map to use for annotation header. Must have a line with ‘value color’ format for each value in annotations. Any value not represented will be assigned a new color.


Add here.

blacksheep simulations [-h] [--ind_sep IND_SEP] [--iqrs IQRS] [--reps REPS]
                       [--output_prefix OUTPUT_PREFIX]
                       [--molecules MOLECULES [MOLECULES ...]] [--pval PVAL]

Positional Arguments


File path to input values. Samples are columns and genes/sites are rows. Only .tsv and .csv accepted.

Named Arguments


Delimiter between the parent molecule (e.g. a gene name such as ATM) and a site identifier (e.g. S365). Default is -

Default: “-“


Number of inter-quartile ranges (IQRs) above or below the median to consider a value an outlier. Default is 1.5.

Default: 1.5


Number of repetitions for the simulation to perform. Default is 1,000,000.

Default: 1000000


Output prefix for writing files. Default is ‘simulated_pvals’.

Default: simulated_pvals


List of parent molecules of interest. Empty list or absence of argument defaults to all parent molecules in input file.

Default: []


p-value threshold for significant results. Must be between 0 and 1.Default is 0.05.

Default: 0.05