Welcome to Blacksheep’s documentation!¶
Indices and tables¶
Command Line Interface¶
usage: blacksheep [-h] [--version]
{normalize,outliers_table,binarize,compare_groups,visualize,deva,simulations}
...
Positional Arguments¶
- which
Possible choices: normalize, outliers_table, binarize, compare_groups, visualize, deva, simulations
Named Arguments¶
- --version, -v
show program’s version number and exit
Sub-commands:¶
normalize¶
Takes an unnormalized values table and uses median of ratios normalization to normalize. Saves a log2 normalized table appropriate for BlackSheep analysis.
blacksheep normalize [-h] [--output_prefix OUTPUT_PREFIX] unnormed_values
Positional Arguments¶
- unnormed_values
Table of values to be normalized. Sites/genes as rows, samples as columns.
Named Arguments¶
- --output_prefix
Prefix for output file. Suffix will be ‘.normalized.tsv’
Default: “values”
outliers_table¶
Takes a table of values and converts to a table of outlier counts.
blacksheep outliers_table [-h] [--output_prefix OUTPUT_PREFIX] [--iqrs IQRS]
[--up_or_down {up,down}] [--ind_sep IND_SEP]
[--do_not_aggregate] [--write_frac_table]
values
Positional Arguments¶
- values
File path to input values. Columns must be samples, genes must be sites or genes. Only .tsv and .csv accepted.
Named Arguments¶
- --output_prefix
Output prefix for writing files. Default outliers.
Default: outliers
- --iqrs
Number of interquartile ranges (IQRs) above or below the median to consider a value an outlier. Default is 1.5 IQRs.
Default: 1.5
- --up_or_down
Possible choices: up, down
Whether to look for up or down outliers. Choices are up or down. Default up.
Default: “true”
- --ind_sep
If site labels have a parent molecule (e.g. a gene name such as ATM) and a site identifier (e.g. S365) this is the delimiter between the two elements. Default is -
Default: “-“
- --do_not_aggregate
Use flag if you do not want to sum outliers based on site prefixes.
Default: False
- --write_frac_table
Use flag if you want to write a table with fraction of values per site, per sample that are outliers. Will not be written by default. Useful for visualization.
Default: False
binarize¶
Takes an annotation table where some columns may have more than 2 possible values (not including empty/null values) and outputs an annotation table with only two values per annotation. Propagates null values.
blacksheep binarize [-h] [--output_prefix OUTPUT_PREFIX] annotations
Positional Arguments¶
- annotations
Annotation table with samples as rows and annotation labels as columns.
Named Arguments¶
- --output_prefix
Output prefix for writing files. Default annotations. Suffix will be ‘.binarized.tsv’
Default: annotations
compare_groups¶
Takes an annotation table and outlier count table (output of outliers_table) and outputs qvalues from a statistical test that looks for enrichment of outlier values in each group in the annotation table. For each value in each comparison, the qvalue table will have 1 column, if there are any genes in that comparison.
blacksheep compare_groups [-h] [--ind_subset IND_SUBSET] [--ind_sep IND_SEP]
[--output_prefix OUTPUT_PREFIX]
[--frac_filter FRAC_FILTER]
[--write_comparison_summaries] [--iqrs IQRS]
[--up_or_down {up,down}] [--write_gene_list]
[--make_heatmaps] [--fdr FDR]
[--red_or_blue {red,blue}]
[--annotation_colors ANNOTATION_COLORS]
outliers_table annotations
Positional Arguments¶
- outliers_table
Table of outlier counts (output of outliers_table). Must be .tsv or .csv file, with outlier and non-outlier counts as columns, and genes/sites as rows.
- annotations
Table of annotations. Must be .csv or .tsv. Samples as rows and comparisons as columns. Comparisons must have only unique values (not including missing values). If there are more options than that, you can use binarize to prepare the table.
Named Arguments¶
- --ind_subset
File with subset of indexes to consider in comparison
- --ind_sep
Index separator for subsetting genes. Only needed if using ind_subset, and if rows of outliers are NOT aggregated.
- --output_prefix
Output prefix for writing files. Default outliers.
Default: outliers
- --frac_filter
The minimum fraction of samples per group that must have an outlier in a gene toconsider that gene in the analysis. This is used to prevent a high number of outlier values in 1 sample from driving a low qvalue. Default 0.3
Default: 0.3
- --write_comparison_summaries
Use flag to write a separate file for each column in the annotations table, with outlier counts in each group, p-values and q-values in each group.
Default: False
- --iqrs
Number of IQRs used to define outliers in the input count table. Optional.
- --up_or_down
Possible choices: up, down
Whether input outlier table represents up or down outliers. Needed for output file labels. Default up
- --write_gene_list
Use flag to write a list of significantly enriched genes for each value in each comparison. If used, need an fdr threshold as well.
Default: False
- --make_heatmaps
Use flag to draw a heatmap of signficantly enriched genes for each value in each comparison. If used, need an fdr threshold as well.
Default: False
- --fdr
FDR cut off to use for signficantly enriched gene lists and heatmaps. Default 0.05
Default: 0.05
- --red_or_blue
Possible choices: red, blue
If –make_heatmaps is called, color of values to draw on heatmap. Default red.
Default: “red”
- --annotation_colors
File with color map to use for annotation header if –make_heatmaps is used. Must have a ‘value color’ format for each value in annotations. Any value not represented will be assigned a new color.
visualize¶
Used to make custom heatmaps from significant genes.
blacksheep visualize [-h] [--output_prefix OUTPUT_PREFIX]
[--annotations_to_show ANNOTATIONS_TO_SHOW [ANNOTATIONS_TO_SHOW ...]]
[--fdr FDR] [--red_or_blue {red,blue}]
[--annotation_colors ANNOTATION_COLORS]
[--write_gene_list]
comparison_qvalues annotations visualization_table
comparison_of_interest
Positional Arguments¶
- comparison_qvalues
Table of qvalues, output from compare_groups. Must be .csv or .tsv. Has genes/sites as rows and comparison values as columns.
- annotations
Table of annotations used to generate qvalues.
- visualization_table
Values to visualize in heatmap. Samples as columns and genes/sites as rows. Using outlier fraction table is recommended, but original values can also be used if no aggregation was used.
- comparison_of_interest
Name of column in qvalues table from which to visualize significant genes.
Named Arguments¶
- --output_prefix
Output prefix for writing files. Default outliers.
Default: outliers
- --annotations_to_show
Names of columns from the annotation table to show in the header of the heatmap. Default is all columns.
- --fdr
FDR threshold to use to select genes to visualize. Default 0.05
Default: 0.05
- --red_or_blue
Possible choices: red, blue
Color of values to draw on heatmap. Default red.
Default: “red”
- --annotation_colors
File with color map to use for annotation header. Must have a line with ‘value color’ format for each value in annotations. Any value not represented will be assigned a new color.
- --write_gene_list
Use flag to write a list of significantly enriched genes for each value in each comparison.
Default: False
deva¶
Runs whole outliers pipeline. Has options to output every possible output.
blacksheep deva [-h] [--output_prefix OUTPUT_PREFIX] [--iqrs IQRS]
[--up_or_down {up,down}] [--do_not_aggregate]
[--write_outlier_table] [--write_frac_table]
[--ind_sep IND_SEP] [--frac_filter FRAC_FILTER]
[--write_comparison_summaries] [--fdr FDR] [--write_gene_list]
[--make_heatmaps] [--red_or_blue {red,blue}]
[--annotation_colors ANNOTATION_COLORS]
values annotations
Positional Arguments¶
- values
File path to input values. Samples are columns and genes/sites are rows. Only .tsv and .csv accepted.
- annotations
File path to annotation values. Rows are sample names, header is different annotations. e.g. mutation status.
Named Arguments¶
- --output_prefix
Output prefix for writing files. Default outliers.
Default: outliers
- --iqrs
Number of inter-quartile ranges (IQRs) above or below the median to consider a value an outlier. Default is 1.5.
Default: 1.5
- --up_or_down
Possible choices: up, down
Whether to look for up or down outliers. Choices are up or down. Default up.
Default: “true”
- --do_not_aggregate
Use flag if you do not want to sum outliers based on site prefixes.
Default: False
- --write_outlier_table
Use flag to write a table of outlier counts.
Default: False
- --write_frac_table
Use flag if you want to write a table with fraction of values per site per sample that are outliers. Useful for custom visualization.
Default: False
- --ind_sep
If site labels have a parent molecule (e.g. a gene name such as ATM) and a site identifier (e.g. S365) this is the delimiter between the two elements. Default is -
Default: “-“
- --frac_filter
The minimum fraction of samples per group that must have an outlier in a gene toconsider that gene in the analysis. This is used to prevent a high number of outlier values in 1 sample from driving a low qvalue. Default 0.3
Default: 0.3
- --write_comparison_summaries
Use flag to write a separate file for each column in the annotations table, with outlier counts in each group, p-values and q-values in each group.
Default: False
- --fdr
FDR threshold to use to select genes to visualize. Default 0.05
Default: 0.05
- --write_gene_list
Use flag to write a list of significantly enriched genes for each value in each comparison.
Default: False
- --make_heatmaps
Use flag to draw a heatmap of significantly enriched genes for each value in each comparison. If used, need an fdr threshold as well.
Default: False
- --red_or_blue
Possible choices: red, blue
Color of values to draw on heatmap. Default red.
Default: “red”
- --annotation_colors
File with color map to use for annotation header. Must have a line with ‘value color’ format for each value in annotations. Any value not represented will be assigned a new color.
simulations¶
Add here.
blacksheep simulations [-h] [--ind_sep IND_SEP] [--iqrs IQRS] [--reps REPS]
[--output_prefix OUTPUT_PREFIX]
[--molecules MOLECULES [MOLECULES ...]] [--pval PVAL]
values
Positional Arguments¶
- values
File path to input values. Samples are columns and genes/sites are rows. Only .tsv and .csv accepted.
Named Arguments¶
- --ind_sep
Delimiter between the parent molecule (e.g. a gene name such as ATM) and a site identifier (e.g. S365). Default is -
Default: “-“
- --iqrs
Number of inter-quartile ranges (IQRs) above or below the median to consider a value an outlier. Default is 1.5.
Default: 1.5
- --reps
Number of repetitions for the simulation to perform. Default is 1,000,000.
Default: 1000000
- --output_prefix
Output prefix for writing files. Default is ‘simulated_pvals’.
Default: simulated_pvals
- --molecules
List of parent molecules of interest. Empty list or absence of argument defaults to all parent molecules in input file.
Default: []
- --pval
p-value threshold for significant results. Must be between 0 and 1.Default is 0.05.
Default: 0.05