split-pipe: high sensitivity single cell RNA sequencing with split pool barcoding

Quick Links

Split-pool combinatorial barcoding makes it possible to scale projects to hundreds of samples and millions of cells, overcoming limitations of previous droplet based technologies. Spipe (split-pipe) implements combinatorial barcoding method for single cell RNA sequencing (scRNA-seq) with dramatically improved sensitivity.

References:

Vuong Tran, Efthymia Papalexi, Sarah Schroeder, Grace Kim, Ajay Sapre, Joey Pangallo, Alex Sova, Peter Matulich, Lauren Kenyon, Zeynep Sayar, Ryan Koehler, Daniel Diaz, Archita Gadkari, Kamy Howitz, Maria Nigos, Charles M. Roco, and Alexander B. Rosenberg
High sensitivity single cell RNA sequencing with split pool barcoding
bioRxiv preprint doi: https://doi.org/10.1101/2022.08.27.505512

Documentation

Important Notes

Module Name: spipe (see the modules page for more information)
Unusual environment variables set
- SPIPE_HOME installation directory
- SPIPE_BIN executable directory
- SPIPE_DATA sample data directory

Interactive job

Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive 
[user@cig 3335 ~]$ module load spipe
[+] Loading singularity  4.0.1  on cn3335
[+] Loading spipe  1.3.1
[user@cn3335 ~]$ split-pipe -h 
usage: split-pipe [-h] [-m MODE] [-c CHEMISTRY] [--kit KIT] [-p PARFILE] [--run_name RUN_NAME] [--fq1 FQ1] [--fq2 FQ2]
                  [--output_dir OUTPUT_DIR] [--genome_dir GENOME_DIR] [--parent_dir PARENT_DIR]
                  [--targeted_list TARGETED_LIST] [--sample SAMPLE_NAME WELLS] [--samp_list SAMP_LIST]
                  [--samp_sltab SAMP_SLTAB] [--genome_name [GENOME_NAME ...]] [--genes [GENES ...]] [--fasta [FASTA ...]]
                  [--gfasta GENOME_NAME FASTA] [--sublibraries [SUBLIBRARIES ...]] [--sublib_list SUBLIB_LIST]
                  [--sublib_pref SUBLIB_PREF] [--sublib_suff SUBLIB_SUFF] [--tscp_use TSCP_USE] [--tscp_min TSCP_MIN]
                  [--tscp_max TSCP_MAX] [--cell_use CELL_USE] [--cell_est CELL_EST] [--cell_xf CELL_XF]
                  [--cell_min CELL_MIN] [--cell_max CELL_MAX] [--cell_list CELL_LIST] [--crispr] [--crsp_guides CRSP_GUIDES]
                  [--crsp_read_thresh CRSP_READ_THRESH] [--crsp_tscp_thresh CRSP_TSCP_THRESH] [--crsp_max_mm]
                  [--crsp_use_star] [--immune_check] [--bcr_analysis] [--tcr_analysis] [--immune_genome IMMUNE_GENOME]
                  [--use_imgt_db] [--immune_read_thresh IMMUNE_READ_THRESH] [--no_save_anndata] [--kit_list] [--chem_list]
                  [--bc_list] [--bc_round_set ROUND NAME] [--rseed RSEED] [--nthreads NTHREADS] [--no_keep_going] [--reuse]
                  [--keep_temps] [--one_step] [--until_step UNTIL_STEP] [--clear_runproc] [--start_timeout START_TIMEOUT]
                  [--kit_score_skip] [--dryrun] [-e] [-V]

SplitPipe data processing pipeline v1.3.1

options:
  -h, --help            show this help message and exit
  -m MODE, --mode MODE  Mode dictates process(s) to run; REQUIRED; See -explain
  -c CHEMISTRY, --chemistry CHEMISTRY
                        Set chemistry version for data
  --kit KIT             Set kit and kit-specific parameters
  -p PARFILE, --parfile PARFILE
                        Parameter file
  --run_name RUN_NAME   Name for run / sublibrary
  --fq1 FQ1             fastq1 - mRNA reads
  --fq2 FQ2             fastq2 - Reads containing barcodes and polyN
  --output_dir OUTPUT_DIR
                        Output dir (created as needed)
  --genome_dir GENOME_DIR
                        Path containing reference genome
  --parent_dir PARENT_DIR
                        Path to output_dir to use as parent; Use existing cell calls, etc
  --targeted_list TARGETED_LIST
                        Target enrichment gene list; csv file with  and/or 
  --sample SAMPLE_NAME WELLS
                        Add sample_name and well range; See '--explain' for format
  --samp_list SAMP_LIST
                        Get samples from file with   per line; See --explain
  --samp_sltab SAMP_SLTAB
                        Get samples from SampleLoadingTable excel file
  --genome_name [GENOME_NAME ...]
                        mkref name(s) of genome(s)/species
  --genes [GENES ...]   mkref GTF file(s) with gene annotations
  --fasta [FASTA ...]   mkref fasta file(s) for genome(s)
  --gfasta GENOME_NAME FASTA
                        mkref genome-fasta file; Gene info taken from fasta header line
  --sublibraries [SUBLIBRARIES ...]
                        Paths to output directories of each sublibrary (Combine mode only)
  --sublib_list SUBLIB_LIST
                        File listing sublibrary paths, one per line (Combine mode only)
  --sublib_pref SUBLIB_PREF
                        Sublibrary list paths prefix (Combine mode only)
  --sublib_suff SUBLIB_SUFF
                        Sublibrary list paths suffix (Combine mode only)
  --tscp_use TSCP_USE   Transcript cutoff to use (Not calculated; given)
  --tscp_min TSCP_MIN   Transcript cutoff min value (Limit for filtered DGE)
  --tscp_max TSCP_MAX   Transcript cutoff max value (Limit for filtered DGE)
  --cell_use CELL_USE   Cell count to use (+/- X-fold for filtered DGE)
  --cell_est CELL_EST   Cell count estimate (Min to X-fold for filtered DGE)
  --cell_xf CELL_XF     Cell estimate X-fold factor (For filtered DGE)
  --cell_min CELL_MIN   Cell count minimum (Lower limit for filtered DGE)
  --cell_max CELL_MAX   Cell count maximum (Upper limit for filtered DGE)
  --cell_list CELL_LIST
                        List of cell barcodes to use (No tscp cutoff calculated)
  --crispr              Run CRISPR analysis, mapping guide RNA to parent dir cells
  --crsp_guides CRSP_GUIDES
                        File with crispr guides and 5' 3' context sequences; csv
  --crsp_read_thresh CRSP_READ_THRESH
                        Minimum reads to qualify crispr transcripts
  --crsp_tscp_thresh CRSP_TSCP_THRESH
                        Minimum transcripts to qualify crispr guide
  --crsp_max_mm         Maximum mismatch (Hamming distance) for crispr guide mapping
  --crsp_use_star       Use STAR for crispr guide aligment
  --immune_check        Check immune database (BCR / TCR) installation status
  --bcr_analysis        Run BCR analysis
  --tcr_analysis        Run TCR analysis
  --immune_genome IMMUNE_GENOME
                        Immune (BCR / TCR) genome name
  --use_imgt_db         Use IMGT databse for immune (BCR / TCR) analysis
  --immune_read_thresh IMMUNE_READ_THRESH
                        Minimum reads to qualify immune transcripts
  --no_save_anndata     Do not save anndata h5ad files
  --kit_list            List valid kit names and chemistry versions
  --chem_list           List valid kit names and chemistry versions
  --bc_list             List installed barcode sets
  --bc_round_set ROUND NAME
                        Specify barcode use as  , where N = 1,2,3
  --rseed RSEED         Random number seed
  --nthreads NTHREADS   Number of threads to use (default = number of CPUs)
  --no_keep_going       Turn off keep_going (Stop on any error)
  --reuse               Reuse existing files if found (vs generate fresh)
  --keep_temps          Keep temp files
  --one_step            Do one step (mode) of pipeline, then stop
  --until_step UNTIL_STEP
                        Run until this step (mode) then stop
  --clear_runproc       Clear run process def files (Only); Need output_dir
  --start_timeout START_TIMEOUT
                        Time for statup env check steps; Zero to skip
  --kit_score_skip      Ignore kit score failure; WARNING Use with caution!
  --dryrun              Dry run; Only setup and report status; Saves run process file
  -e, --explain         Explain assumptions and usage details
  -V, --version         show program's version number and exit
[user@cn3335 ~]$ exit
user@biowulf]$