Split-pool combinatorial barcoding makes it possible to scale projects to hundreds of samples and millions of cells, overcoming limitations of previous droplet based technologies. Spipe (split-pipe) implements combinatorial barcoding method for single cell RNA sequencing (scRNA-seq) with dramatically improved sensitivity.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive
[user@cig 3335 ~]$ module load spipe
[+] Loading singularity 4.0.1 on cn3335
[+] Loading spipe 1.3.1
[user@cn3335 ~]$ split-pipe -h
usage: split-pipe [-h] [-m MODE] [-c CHEMISTRY] [--kit KIT] [-p PARFILE] [--run_name RUN_NAME] [--fq1 FQ1] [--fq2 FQ2]
[--output_dir OUTPUT_DIR] [--genome_dir GENOME_DIR] [--parent_dir PARENT_DIR]
[--targeted_list TARGETED_LIST] [--sample SAMPLE_NAME WELLS] [--samp_list SAMP_LIST]
[--samp_sltab SAMP_SLTAB] [--genome_name [GENOME_NAME ...]] [--genes [GENES ...]] [--fasta [FASTA ...]]
[--gfasta GENOME_NAME FASTA] [--sublibraries [SUBLIBRARIES ...]] [--sublib_list SUBLIB_LIST]
[--sublib_pref SUBLIB_PREF] [--sublib_suff SUBLIB_SUFF] [--tscp_use TSCP_USE] [--tscp_min TSCP_MIN]
[--tscp_max TSCP_MAX] [--cell_use CELL_USE] [--cell_est CELL_EST] [--cell_xf CELL_XF]
[--cell_min CELL_MIN] [--cell_max CELL_MAX] [--cell_list CELL_LIST] [--crispr] [--crsp_guides CRSP_GUIDES]
[--crsp_read_thresh CRSP_READ_THRESH] [--crsp_tscp_thresh CRSP_TSCP_THRESH] [--crsp_max_mm]
[--crsp_use_star] [--immune_check] [--bcr_analysis] [--tcr_analysis] [--immune_genome IMMUNE_GENOME]
[--use_imgt_db] [--immune_read_thresh IMMUNE_READ_THRESH] [--no_save_anndata] [--kit_list] [--chem_list]
[--bc_list] [--bc_round_set ROUND NAME] [--rseed RSEED] [--nthreads NTHREADS] [--no_keep_going] [--reuse]
[--keep_temps] [--one_step] [--until_step UNTIL_STEP] [--clear_runproc] [--start_timeout START_TIMEOUT]
[--kit_score_skip] [--dryrun] [-e] [-V]
SplitPipe data processing pipeline v1.3.1
options:
-h, --help show this help message and exit
-m MODE, --mode MODE Mode dictates process(s) to run; REQUIRED; See -explain
-c CHEMISTRY, --chemistry CHEMISTRY
Set chemistry version for data
--kit KIT Set kit and kit-specific parameters
-p PARFILE, --parfile PARFILE
Parameter file
--run_name RUN_NAME Name for run / sublibrary
--fq1 FQ1 fastq1 - mRNA reads
--fq2 FQ2 fastq2 - Reads containing barcodes and polyN
--output_dir OUTPUT_DIR
Output dir (created as needed)
--genome_dir GENOME_DIR
Path containing reference genome
--parent_dir PARENT_DIR
Path to output_dir to use as parent; Use existing cell calls, etc
--targeted_list TARGETED_LIST
Target enrichment gene list; csv file with and/or
--sample SAMPLE_NAME WELLS
Add sample_name and well range; See '--explain' for format
--samp_list SAMP_LIST
Get samples from file with per line; See --explain
--samp_sltab SAMP_SLTAB
Get samples from SampleLoadingTable excel file
--genome_name [GENOME_NAME ...]
mkref name(s) of genome(s)/species
--genes [GENES ...] mkref GTF file(s) with gene annotations
--fasta [FASTA ...] mkref fasta file(s) for genome(s)
--gfasta GENOME_NAME FASTA
mkref genome-fasta file; Gene info taken from fasta header line
--sublibraries [SUBLIBRARIES ...]
Paths to output directories of each sublibrary (Combine mode only)
--sublib_list SUBLIB_LIST
File listing sublibrary paths, one per line (Combine mode only)
--sublib_pref SUBLIB_PREF
Sublibrary list paths prefix (Combine mode only)
--sublib_suff SUBLIB_SUFF
Sublibrary list paths suffix (Combine mode only)
--tscp_use TSCP_USE Transcript cutoff to use (Not calculated; given)
--tscp_min TSCP_MIN Transcript cutoff min value (Limit for filtered DGE)
--tscp_max TSCP_MAX Transcript cutoff max value (Limit for filtered DGE)
--cell_use CELL_USE Cell count to use (+/- X-fold for filtered DGE)
--cell_est CELL_EST Cell count estimate (Min to X-fold for filtered DGE)
--cell_xf CELL_XF Cell estimate X-fold factor (For filtered DGE)
--cell_min CELL_MIN Cell count minimum (Lower limit for filtered DGE)
--cell_max CELL_MAX Cell count maximum (Upper limit for filtered DGE)
--cell_list CELL_LIST
List of cell barcodes to use (No tscp cutoff calculated)
--crispr Run CRISPR analysis, mapping guide RNA to parent dir cells
--crsp_guides CRSP_GUIDES
File with crispr guides and 5' 3' context sequences; csv
--crsp_read_thresh CRSP_READ_THRESH
Minimum reads to qualify crispr transcripts
--crsp_tscp_thresh CRSP_TSCP_THRESH
Minimum transcripts to qualify crispr guide
--crsp_max_mm Maximum mismatch (Hamming distance) for crispr guide mapping
--crsp_use_star Use STAR for crispr guide aligment
--immune_check Check immune database (BCR / TCR) installation status
--bcr_analysis Run BCR analysis
--tcr_analysis Run TCR analysis
--immune_genome IMMUNE_GENOME
Immune (BCR / TCR) genome name
--use_imgt_db Use IMGT databse for immune (BCR / TCR) analysis
--immune_read_thresh IMMUNE_READ_THRESH
Minimum reads to qualify immune transcripts
--no_save_anndata Do not save anndata h5ad files
--kit_list List valid kit names and chemistry versions
--chem_list List valid kit names and chemistry versions
--bc_list List installed barcode sets
--bc_round_set ROUND NAME
Specify barcode use as , where N = 1,2,3
--rseed RSEED Random number seed
--nthreads NTHREADS Number of threads to use (default = number of CPUs)
--no_keep_going Turn off keep_going (Stop on any error)
--reuse Reuse existing files if found (vs generate fresh)
--keep_temps Keep temp files
--one_step Do one step (mode) of pipeline, then stop
--until_step UNTIL_STEP
Run until this step (mode) then stop
--clear_runproc Clear run process def files (Only); Need output_dir
--start_timeout START_TIMEOUT
Time for statup env check steps; Zero to skip
--kit_score_skip Ignore kit score failure; WARNING Use with caution!
--dryrun Dry run; Only setup and report status; Saves run process file
-e, --explain Explain assumptions and usage details
-V, --version show program's version number and exit
[user@cn3335 ~]$ exit
user@biowulf]$