FreePSI is a new method for genome-wide percent spliced in (PSI) estimation that requires neither a reference transcriptome (hence, transcriptome-free) nor the mapping of RNA-seq reads (hence, alignment-free). The first attribute allows FreePSI to work effectively when a high quality reference transcriptome is unavailable and the second not only helps make FreePSI more efficient, it also eliminates the necessity of dealing with multi-reads.
Allocate an interactive session and run the program. In this example, we run a script that contains FreePSI commands. Reviewing the script provides individual command syntax.
Sample session (user input in bold):
[user@biowulf ~]$ sinteractive -c6 --mem=12g --gres=lscratch:10
salloc: Pending job allocation 33604907
salloc: job 33604907 queued and waiting for resources
salloc: job 33604907 has been allocated resources
salloc: Granted job allocation 33604907
salloc: Waiting for resource configuration
salloc: Nodes cn0849 are ready for job
srun: error: x11: no local DISPLAY defined, skipping
error: unable to open file /tmp/slurm-spank-x11.33604907.0
slurmstepd: error: x11: unable to read DISPLAY value
[user@cn0849 ~]$ module load freepsi
[+] Loading freepsi 0.3 on cn0849
[+] Loading singularity 3.8.5-1 on cn0849
[+] Loading jellyfish 2.3.0
[user@cn0849 ~]$ cd /lscratch/$SLURM_JOB_ID
[user@cn0849 33604907]$ cp -r $FREEPSI_TESTDATA .
[user@cn0849 33604907]$ cd TESTDATA/
[user@cn0849 TESTDATA]$ cat run_FreePSI.sh
#!/bin/bash
set -e
# Provide the directory containing Jellyfish (usually named as 'bin')
Jellyfish="$(dirname $(which jellyfish))"
# Provide the directory containing FreePSI
# E.g.
#FreePSI=../bin
FreePSI="${FREEPSI_HOME}/bin"
if [ -z ${Jellyfish} ]; then
echo "Error: Please modify me to provide the directory containing Jellyfish"
exit;
fi
if [ -z ${FreePSI} ]; then
echo "Error: Please modify me to provide the directory containing FreePSI"
exit;
fi
GENOME_DIR=./genome
BND_FILE=./annotation/hg38_refGene_exonBoundary_chr21.bed
READS=RNA-seq
K=27
THREAD=$SLURM_CPUS_PER_TASK
set -x
# Count k-mers in RNA-seq reads using jellyfish
${Jellyfish}/jellyfish count -m ${K} -s 100M -t ${THREAD} -Q 5 ${READS}/reads_final.1.fastq -o ${READS}/reads.1.jf
${Jellyfish}/jellyfish dump ${READS}/reads.1.jf -o ${READS}/reads.1.fa
${Jellyfish}/jellyfish count -m ${K} -s 100M -t ${THREAD} -Q 5 ${READS}/reads_final.2.fastq -o ${READS}/reads.2.jf
${Jellyfish}/jellyfish dump ${READS}/reads.2.jf -o ${READS}/reads.2.fa
# Produce raw estimates of PSI values using FreePSI
#Build
${FreePSI}/freePSI build\
-k $K -p ${THREAD} \
-g ${GENOME_DIR} \
-a ${BND_FILE} \
-1 ${READS}/reads.1.fa \
-2 ${READS}/reads.2.fa \
-o ./hashtable.json
#Quant
${FreePSI}/freePSI quant\
-k $K -p ${THREAD} \
-i ./hashtable.json \
-o .
# Post-process the raw estimates of PSI values
# python3 ${FreePSI}/postProc.py \
${FreePSI}/postProc.py \
./psi_freePSI_raw.json \
./psi_freePSI.json
# Summarize the PSI values into a readable file
# python3 ${FreePSI}/summary.py \
${FreePSI}/summary.py \
./annotation/hg38_refGene_exonBoundary_chr21.bed \
./psi_freePSI.json \
./psi_freePSI.summary
[user@cn0849 TESTDATA]$ ./run_FreePSI.sh
+ /usr/local/apps/jellyfish/2.3.0/bin/jellyfish count -m 27 -s 100M -t 6 -Q 5 RNA-seq/reads_final.1.fastq -o RNA-seq/reads.1.jf
+ /usr/local/apps/jellyfish/2.3.0/bin/jellyfish dump RNA-seq/reads.1.jf -o RNA-seq/reads.1.fa
+ /usr/local/apps/jellyfish/2.3.0/bin/jellyfish count -m 27 -s 100M -t 6 -Q 5 RNA-seq/reads_final.2.fastq -o RNA-seq/reads.2.jf
+ /usr/local/apps/jellyfish/2.3.0/bin/jellyfish dump RNA-seq/reads.2.jf -o RNA-seq/reads.2.fa
+ /usr/local/apps/freepsi/0.3/bin/freePSI build -k 27 -p 6 -g ./genome -a ./annotation/hg38_refGene_exonBoundary_chr21.bed -1 RNA-seq/reads.1.fa -2 RNA-seq/reads.2.fa -o ./hashtable.json
### Start to build theoretical and real kmer profile ...
[...snip]
Elasped time 2s.
### Start to refine solution and compute PSI ...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> OK!
### Finish refining solution and computing PSI ...
### Finished!
CPU Time elapsed: 7s.
Natural Time elapsed: 3s.
+ /usr/local/apps/freepsi/0.3/bin/postProc.py ./psi_freePSI_raw.json ./psi_freePSI.json
+ /usr/local/apps/freepsi/0.3/bin/summary.py ./annotation/hg38_refGene_exonBoundary_chr21.bed ./psi_freePSI.json ./psi_freePSI.summary
[user@cn0849 TESTDATA]$ exit
exit
salloc: Relinquishing job allocation 33604907
[user@biowulf ~]$
Create a batch input file (e.g. freepsi.sh). Use the script in TESTDATA (shown above) as an example:
Submit this job using the Slurm sbatch command.
sbatch [--cpus-per-task=#] [--mem=#] freepsi.sh
Create a swarmfile (e.g. freepsi.swarm). For the sake of simplicity we place a single command in each line, but you can place multiple commands on a single line seperated by semicolons or you can create scripts and execute a different script or the same script with different inputs on each line instead:
freePSI quant -k $K -p ${SLURM_CPUS_PER_TASK} -i /path/to/hashtable1.json -o /our/dir1
freePSI quant -k $K -p ${SLURM_CPUS_PER_TASK} -i /path/to/hashtable2.json -o /our/dir2
freePSI quant -k $K -p ${SLURM_CPUS_PER_TASK} -i /path/to/hashtable3.json -o /our/dir3
freePSI quant -k $K -p ${SLURM_CPUS_PER_TASK} -i /path/to/hashtable4.json -o /our/dir4
Submit this job using the swarm command.
swarm -f freepsi.swarm [-g #] [-t #] --module freepsiwhere
| -g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
| -t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
| --module freepsi | Loads the freepsi module for each subjob in the swarm |