IsoQuant on Biowulf

Quick Links

IsoQuant is used to analyze long read RNA sequencing data from PacBio or Oxford Nanopore.

IsoQuant allows to reconstruct and quantify transcript models with high precision and decent recall. If the reference annotation is given, IsoQuant also assigns reads to the annotated isoforms based on their intron and exon structure. IsoQuant further performs annotated gene, isoform, exon and intron quantification. If reads are grouped (e.g. according to cell type), counts are reported according to the provided grouping.

References:

Prjibelski, A.D., Mikheenko, A., Joglekar, A. et al. Accurate isoform discovery with IsoQuant using long reads. Nat Biotechnol 41, 915–918 (2023).

Documentation

IsoQuant Main Site

Important Notes

Module Name: isoquant (see the modules page for more information)
Multithreaded. It defaults to 16 CPUs, so please allocate jobs accordingly or reduce the number of threads used by isoquant with the --threads option.
isoquant jobs can be resumed if your jobs run out wall-time
Prior to 3.13.0 installation, the executable is isoquant.py

Interactive job

Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

[user@biowulf]$ sinteractive --mem=8G -c4
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load isoquant

[user@cn3144 ~]$ cd /data/$USER

[user@cn3144 ~]$ mkdir -p isoquant_test/output

[user@cn3144 ~]$ cd isoquant_test

[user@cn3144 ~]$ cp -r $ISOQUANT_TEST_DATA/toy_data .

[user@cn3144 ~]$ isoquant --reference toy_data/MAPT.Mouse.reference.fasta \
 --genedb toy_data/MAPT.Mouse.genedb.gtf \
 --fastq toy_data/MAPT.Mouse.ONT.simulated.fastq \
 --data_type nanopore \
 -o output --threads 4
2026-05-20 14:41:14,222 - INFO - Running IsoQuant version 3.13.0
2026-05-20 14:41:14,222 - WARNING - Output folder already exists, some files may be overwritten.
2026-05-20 14:41:14,223 - INFO - Novel unspliced transcripts will not be reported, set --report_novel_unspliced true to discover them
2026-05-20 14:41:14,223 - INFO -  === IsoQuant pipeline started ===
2026-05-20 14:41:14,223 - INFO - Python version: 3.13.13 | packaged by conda-forge | (main, Apr  8 2026, 02:00:33) [GCC 14.3.0]
2026-05-20 14:41:14,223 - INFO - gffutils version: 0.14
2026-05-20 14:41:14,223 - INFO - pysam version: 0.24.0
2026-05-20 14:41:14,223 - INFO - pyfaidx version: 0.9.0.4
2026-05-20 14:41:14,223 - INFO - Reading reference genome from /lscratch/19875279/toy_data/MAPT.Mouse.reference.fasta
2026-05-20 14:41:14,224 - INFO - Checking input gene annotation
2026-05-20 14:41:14,225 - INFO - Gene annotation seems to be correct
2026-05-20 14:41:14,225 - INFO - Converting gene annotation file to .db format (takes a while)...
...
2026-05-20 14:41:15,076 - INFO - Finished processing chromosome chr11
2026-05-20 14:41:15,086 - INFO - Read assignments are stored in output/OUT/OUT.read_assignments.tsv.gz.gz
2026-05-20 14:41:15,086 - INFO - Read assignment statistics
2026-05-20 14:41:15,086 - INFO -   noninformative: 15
2026-05-20 14:41:15,086 - INFO -   unique: 117
2026-05-20 14:41:15,086 - INFO - Gene counts are stored in output/OUT/OUT.gene_counts.tsv
2026-05-20 14:41:15,086 - INFO - Transcript counts are stored in output/OUT/OUT.transcript_counts.tsv
2026-05-20 14:41:15,086 - INFO - Counts can be converted to other formats using isoquant_lib/convert_grouped_counts.py
2026-05-20 14:41:15,086 - INFO - Transcript model statistics
2026-05-20 14:41:15,087 - INFO -   known: 10
2026-05-20 14:41:15,087 - INFO - Transcript model file output/OUT/OUT.transcript_models.gtf
2026-05-20 14:41:15,087 - INFO - Extended annotation is saved to output/OUT/OUT.extended_annotation.gtf
2026-05-20 14:41:15,087 - INFO - Counts for generated transcript models are saves to: output/OUT/OUT.discovered_transcript_counts.tsv
2026-05-20 14:41:15,088 - INFO - Processed experiment OUT
2026-05-20 14:41:15,088 - INFO - Processed 1 experiment
2026-05-20 14:41:15,088 - INFO -  === IsoQuant pipeline finished ===
[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job

Most jobs should be run as batch jobs.

Create a batch input file (e.g. isoquant.sh). For example:

#!/bin/bash
set -e
module load isoquant
cd /data/$USER/analysis
isoquant -d pacbio_ccs --bam mapped_reads.bam --genedb annotation.db --threads $SLURM_CPUS_PER_TASK --output output_dir

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=#] [--mem=#] [--time=DD-HH:MM:SS] isoquant.sh

Swarm of Jobs

A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. isoquant.swarm). For example:

isoquant -d pacbio_ccs --bam reads1.bam --genedb ann1.db --output out1 --threads $SLURM_CPUS_PER_TASK
isoquant -d pacbio_ccs --bam reads2.bam --genedb ann2.db --output out2 --threads $SLURM_CPUS_PER_TASK
isoquant -d pacbio_ccs --bam reads3.bam --genedb ann3.db --output out3 --threads $SLURM_CPUS_PER_TASK
isoquant -d pacbio_ccs --bam reads4.bam --genedb ann4.db --output out4 --threads $SLURM_CPUS_PER_TASK

Submit this job using the swarm command.

swarm -f isoquant.swarm [-g #] [-t #] [--time ...] --module isoquant

where

`-g #`	Number of Gigabytes of memory required for each process (1 line in the swarm command file)
`-t #`	Number of threads/CPUs required for each process (1 line in the swarm command file).
`--time ...`	Wall time for job in format HH:MM:SS
`--module isoquant`	Loads the isoquant module for each subjob in the swarm