CytoSPACE implements an optimization method for mapping individual cells from a single-cell RNA sequencing atlas to spatial expression profiles. Across diverse platforms and tissue types, it outperforms previous methods with respect to noise tolerance and accuracy, enabling tissue cartography at single-cell resolution.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=12g -c8 --grep=lscratch:20
[user@cig 3335 ~]$ module load cytospace
[+] Loading singularity 4.0.1 on cn3335
[+] Loading cytospace 1.0.6
[user@cn3335 ~]$ cytospace -h
connect localhost port 6000: Connection refused
usage: cytospace [-h] -sp SCRNA_PATH -ctp CELL_TYPE_PATH [-stp ST_PATH] [-cp COORDINATES_PATH] [-srp SPACERANGER_PATH]
[-stctp ST_CELL_TYPE_PATH] [-ctfep CELL_TYPE_FRACTION_ESTIMATION_PATH] [-ncpsp N_CELLS_PER_SPOT_PATH]
[-o OUTPUT_FOLDER] [-op OUTPUT_PREFIX] [-mcn MEAN_CELL_NUMBERS] [--downsample-off]
[-smtpc SCRNA_MAX_TRANSCRIPTS_PER_CELL] [-sc] [-noss NUMBER_OF_SELECTED_SPOTS] [-sss]
[-nosss NUMBER_OF_SELECTED_SUB_SPOTS] [-nop NUMBER_OF_PROCESSORS] [-sm {lapjv,lapjv_compat,lap_CSPR}]
[-dm {Pearson_correlation,Spearman_correlation,Euclidean}] [-sam {duplicates,place_holders}] [-se SEED]
[-p] [-g GEOMETRY] [-nc NUM_COLUMN] [-mp MAX_NUM_CELLS_PLOT]
CytoSPACE is a computational strategy for assigning single-cell transcriptomes to in situ spatial transcriptomics (ST)
data. Our method solves single cell/spot assignment by minimizing a correlation-based cost function through a linear
programming-based optimization routine.
optional arguments:
-h, --help show this help message and exit
-stp ST_PATH, --st-path ST_PATH
Path to spatial transcriptomics data (expressions)
-cp COORDINATES_PATH, --coordinates-path COORDINATES_PATH
Path to transcriptomics data (coordinates)
-srp SPACERANGER_PATH, --spaceranger-path SPACERANGER_PATH
Path to SpaceRanger tar.gz data file
-stctp ST_CELL_TYPE_PATH, --st-cell-type-path ST_CELL_TYPE_PATH
Path to ST cell type file (recommended for single-cell ST)
-ctfep CELL_TYPE_FRACTION_ESTIMATION_PATH, --cell-type-fraction-estimation-path CELL_TYPE_FRACTION_ESTIMATION_PATH
Path to ST cell type fraction file (recommended for bulk ST)
-ncpsp N_CELLS_PER_SPOT_PATH, --n-cells-per-spot-path N_CELLS_PER_SPOT_PATH
Path to number of cells per ST spot file
-o OUTPUT_FOLDER, --output-folder OUTPUT_FOLDER
Relative path to the output folder
-op OUTPUT_PREFIX, --output-prefix OUTPUT_PREFIX
Prefix of results stored in the 'output_folder'
-mcn MEAN_CELL_NUMBERS, --mean-cell-numbers MEAN_CELL_NUMBERS
Mean number of cells per spot, default 5 (appropriate for Visium). If analyzing legacy spatial
transcriptomics data, set to 20
--downsample-off Turn off downsampling for scRNA-seq data
-smtpc SCRNA_MAX_TRANSCRIPTS_PER_CELL, --scRNA_max_transcripts_per_cell SCRNA_MAX_TRANSCRIPTS_PER_CELL
Number of transcripts per cell to downsample scRNA-seq dataset to. This allows for assignments
that are not dependent on the overall expression level
-sc, --single-cell Use single-cell spatial approach if specified
-noss NUMBER_OF_SELECTED_SPOTS, --number-of-selected-spots NUMBER_OF_SELECTED_SPOTS
Number of selected spots from ST data used in each iteration
-sss, --sampling-sub-spots
Sample subspots to limit the number of mapped cells if specified
-nosss NUMBER_OF_SELECTED_SUB_SPOTS, --number-of-selected-sub-spots NUMBER_OF_SELECTED_SUB_SPOTS
Number of selected subspots from ST data to limit the number of mapped cells
-nop NUMBER_OF_PROCESSORS, --number-of-processors NUMBER_OF_PROCESSORS
Number of processors used for the analysis
-sm {lapjv,lapjv_compat,lap_CSPR}, --solver-method {lapjv,lapjv_compat,lap_CSPR}
Which solver to use for the linear assignment problem, default 'lapjv'
-dm {Pearson_correlation,Spearman_correlation,Euclidean}, --distance-metric {Pearson_correlation,Spearman_correlation,Euclidean}
Which distance metric to use for the cost matrix, default 'Pearson_correlation'
-sam {duplicates,place_holders}, --sampling-method {duplicates,place_holders}
Which underlying method to use for dealing with duplicated cells, default 'duplicates'
-se SEED, --seed SEED
Set seed for random generators, default 1
-p, --plot-off Turn create plots on/off
-g GEOMETRY, --geometry GEOMETRY
ST geometry, either 'honeycomb' or 'square' accepted
-nc NUM_COLUMN, --num-column NUM_COLUMN
Number of columns in figure
-mp MAX_NUM_CELLS_PLOT, --max-num-cells-plot MAX_NUM_CELLS_PLOT
Maximum number of cells to plot in single-cell visualization
Required arguments:
-sp SCRNA_PATH, --scRNA-path SCRNA_PATH
Path to scRNA-Seq data
-ctp CELL_TYPE_PATH, --cell-type-path CELL_TYPE_PATH
Path to cell type labels
[user@cn3335 ~]$ mkdir /data/$USER/cytospace && cd /data/$USER/cytospace
[user@cn3335 ~]$ cp $CS_DATA/* .
[user@cn3335 ~]$ cytospace \
-sp brca_scRNA_GEP.txt \
-ctp brca_scRNA_celllabels.txt \
-stp brca_STdata_GEP.txt \
-cp brca_STdata_coordinates.txt
...
Read and validate data ...
100% |██████████████████████████████████████████████████| Reading data [done]
Estimating cell type fractions
2024-01-25 09:08:04 Load ST data
PC_ 1
Positive: IGKC, IGHG1, DCN, IGHG2, IGHA1, COL6A3, APOE, JCHAIN, LUM, MMP2
AEBP1, IGLC1, IGHG3, COL3A1, HLA-DRA, C1R, SFRP4, HMOX1, VIM, POSTN
SPARC, COL6A1, IGHG4, LYZ, SFRP2, COL1A1, C3, APOC1, COL6A2, COL1A2
Negative: AZGP1, MUCL1, SCGB2A2, ERBB2, KRT7, CD24, SCGB1D2, MAL2, CRISP3, ATG5
TACSTD2, SPINT2, PPDPF, KRT8, LCN2, LTF, PIGR, SLPI, CLDN4, CRABP2
KIAA1324, PSMD3, CFB, ORMDL3, FGB, ARPC1A, FOXA1, S100A9, PDZK1IP1, IFI6
PC_ 2
Positive: SFRP4, DCN, COL6A3, IGKC, MMP2, SFRP2, AEBP1, IGHG1, LUM, C1R
COL3A1, CCN2, COL1A2, COL1A1, SPARC, CCDC80, IGLC1, IGFBP7, FBLN1, C1S
JCHAIN, IGHG4, CXCL14, IGHG2, C3, IGFBP4, CTSK, FBLN2, IGHA1, VCAN
Negative: APOC1, APOE, FTL, SPP1, CTSL, IFI30, CTSB, CTSD, CD68, LAPTM5
SLC11A1, ACP5, TYROBP, PLIN2, FABP5, GLUL, GPNMB, SDS, CTSZ, PLAUR
HMOX1, SAT1, AQP9, SCD, FCGR3A, C1QB, LYZ, TREM2, FCER1G, PSAP
PC_ 3
Positive: HMOX1, POSTN, CTSB, FGB, ACTA2, FGG, TAGLN, AEBP1, SPARC, BGN
SPP1, COL1A1, CCN2, FTL, TGFBI, MYL9, COL1A2, COL5A1, LUM, TIMP1
SULF1, CTSL, FN1, DCN, IGHG3, GLUL, COL3A1, APOE, CTSD, APOC1
Negative: CCL19, TRAC, TRBC2, LTB, CXCL9, CCL5, TRBC1, IL7R, LAMP3, BIRC3
CXCR4, ISG15, CXCL11, CD3D, PTPRC, IFITM1, CCR7, IFI6, IDO1, CORO1A
CD37, SELL, UBD, CXCL13, IKZF1, ISG20, CD3E, RAC2, IFI44L, IL2RG
PC_ 4
Positive: IGKC, IGHG1, IGHG2, JCHAIN, IGHA1, IGHG4, IGLC1, IGHG3, IGKV4-1, HMOX1
IGHM, SFRP4, IGHJ6, C3, DERL3, MZB1, MMP2, DCN, PTGDS, IGHV6-1
XBP1, PIM2, SCGB2A2, TENT5C, TXNDC5, SCD, POU2AF1, IGHD, CCDC80, SELENOP
Negative: COL4A1, IGFBP7, ACTA2, COL4A2, HSPG2, MCAM, PLVAP, TIMP3, MYL9, VWF
A2M, TIMP1, TAGLN, CST1, TPM2, ENG, SPARC, KRT5, S100A2, MMP11
COL15A1, ID1, MYLK, CD93, LAMC2, PODXL, FN1, POSTN, CDH5, CALD1
PC_ 5
Positive: LTF, FGB, FGG, RARRES1, CLU, LCN2, WFDC2, S100A9, SERPINA3, CP
LBP, PDZK1IP1, ORM1, AGT, SLPI, CAPN13, TGM2, RDH10, TACSTD2, CHI3L2
FGA, ORM2, MGP, UBD, SLC34A2, SOD2, ELF3, GPRC5A, GABRP, CFB
Negative: SCGB2A2, FADS2, TOP2A, PPP1R1A, MUCL1, PEG10, SCGB1D2, HIST1H1B, PIP, HIST1H2BH
GATA3, HIST1H2BG, C2orf72, HIST1H3H, CYP4Z1, HIST1H4A, UBE2C, DBI, HIST1H4D, TRPS1
CDC6, ADAMTS1, NQO1, NPNT, NRG1, ASPH, SPDEF, CLEC3A, FASN, HIST1H2BO
2024-01-25 09:08:48 Load scRNA data
PC_ 1
Positive: IGFBP7, SPARC, COL1A2, COL1A1, COL3A1, CALD1, COL6A2, TAGLN, BGN, MYL9
LUM, THY1, DCN, TPM2, POSTN, COL5A2, COL6A3, IGFBP4, AEBP1, COL6A1
CTHRC1, ACTA2, C1S, IFITM3, SFRP2, RARRES2, CTGF, TIMP3, VCAN, CTSK
Negative: HLA-DRA, HLA-DRB1, TYROBP, CD74, HLA-DPB1, HLA-DPA1, HLA-DQA1, CCL4, FCER1G, CCL5
CD69, SRGN, HLA-DQB1, LYZ, CXCR4, C1QB, C1QA, RGS1, C1QC, LAPTM5
CCL3, NKG7, FCGR3A, CD52, AIF1, DUSP2, CD83, APOC1, CCL4L2, PTPRC
PC_ 2
Positive: MUCL1, CD24, KRT7, CALML5, KRT18, SCGB1B2P, NKG7, FXYD3, KRT8, CD69
GZMA, MGST1, CD7, GNLY, CLDN4, AZGP1, CCL5, SLPI, CD2, ERBB2
KLRB1, RPL13A, PERP, CD3D, TACSTD2, CD3E, S100P, TM4SF1, GZMB, ELF3
Negative: HLA-DRA, FTL, HLA-DRB1, CD74, HLA-DPA1, C1QA, C1QB, HLA-DPB1, HLA-DQA1, C1QC
APOE, LYZ, TYROBP, CTSB, APOC1, FTH1, HLA-DQB1, FCER1G, CD68, CST3
AIF1, MS4A6A, CTSD, FCGR3A, CTSS, PSAP, CTSZ, FN1, MS4A7, CCL3
PC_ 3
Positive: CD24, KRT7, CALML5, MUCL1, TM4SF1, KRT18, KRT8, MGST1, FXYD3, AZGP1
SLPI, CLDN4, ERBB2, TACSTD2, SPINT2, DBI, MIEN1, GRB7, S100P, ELF3
CRIP2, PERP, C17orf89, PSMD3, KRT19, MIF, EPCAM, S100A14, TM7SF2, LMTK3
Negative: CCL5, CD69, NKG7, GZMA, GNLY, CD7, CCL4, CXCR4, CD2, CST7
GZMB, KLRB1, CD3E, RGCC, CD3D, IL7R, IL32, CD52, TRBC2, CTSW
PTPRC, TRBC1, TNFAIP3, TRAC, KLRD1, DUSP2, B2M, IFNG, SRGN, RHOH
PC_ 4
Positive: MUCL1, CD24, KRT7, CALML5, KRT18, MGST1, FXYD3, KRT8, AZGP1, CLDN4
SLPI, DBI, SCGB1B2P, ERBB2, SPINT2, LUM, DCN, PERP, SFRP2, TACSTD2
CTSK, S100P, RARRES2, ELF3, COL1A1, SDC1, MIEN1, COL3A1, GRB7, AEBP1
Negative: PLVAP, RAMP2, CALCRL, VWF, PECAM1, SPARCL1, IGFBP7, HSPG2, AQP1, RAMP3
ADGRL4, ESAM, EMCN, CLEC14A, GNG11, CD34, CD93, COL4A1, EGFL7, RNASE1
ENG, COL4A2, IFITM3, A2M, IFITM1, ADAMTS1, SPRY1, CDH5, CXorf36, FLT1
PC_ 5
Positive: NDUFA4L2, RGS5, COL18A1, MCAM, NOTCH3, SOD3, LHFP, PPP1R14A, CCDC102B, ADIRF
HIGD1B, TBX2, PDGFA, NR2F2, CPE, C11orf96, PGF, PLXDC1, TPPP3, COL4A2
COX4I2, EPS8, CALD1, COL4A1, ID4, ENPEP, SEPT4, PDGFRB, EGFL6, ACTA2
Negative: CTHRC1, MMP2, SFRP2, CTSK, DCN, COL10A1, RARRES2, FBLN1, COL11A1, LUM
MFAP5, THBS2, HTRA1, HSPG2, NBL1, RAMP2, SFRP4, PLVAP, COL8A1, WISP2
VCAN, AEBP1, CCDC80, FAP, ITGBL1, VWF, CXCL12, PECAM1, PDGFRL, DPYSL3
2024-01-25 09:09:29 Integration
Performing PCA on the provided reference using 2118 features as input.
Projecting PCA
Finding neighborhoods
Finding anchors
Found 1129 anchors
Filtering anchors
Retained 1043 anchors
Finding integration vectors
Finding integration vector weights
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Predicting cell labels
100% |██████████████████████████████████████████████████| Reading data [done]
Time to read and validate data: 180.52 seconds
Estimating number of cells in each spot ...
Time to estimate number of cells per spot: 0.99 seconds
Down/up sample of scRNA-seq data according to estimated cell type fractions
Time to down/up sample scRNA-seq data: 6.02 seconds
Building cost matrix ...
Time to build cost matrix: 6.22 seconds
Solving linear assignment problem ...
Time to solve linear assignment problem: 96.06 seconds
Total time to run CytoSPACE core algorithm: 114.61 seconds
Saving results ...
100% |██████████████████████████████████████████████████| Reading data [done]
Detecting row and column indexing of Visium data; rescaling for coordinates
findfont: Font family ['sans-serif'] not found. Falling back to DejaVu Sans.
findfont: Generic family 'sans-serif' not found because none of the following families were found: Arial
findfont: Font family ['sans-serif'] not found. Falling back to DejaVu Sans.
findfont: Generic family 'sans-serif' not found because none of the following families were found: Arial
Detecting row and column indexing of Visium data; rescaling for coordinates
Total execution time: 363.29 seconds
[user@cn3335 ~]$ exit
user@biowulf]$