deepmod2 on Biowulf

Quick Links

Deepmod2 is a tool for finding DNA 5mC methylation from Oxford Nanopore reads. It can call methylation from POD5 and FAST5 files basecalled with either Guppy or Dorado. The output is a methylation tagged BAM file.

References:

Ahsan, M.U., Gouru, A., Chan, J. et al. A signal processing and deep learning framework for methylation detection using Oxford Nanopore sequencing. Nat Commun 15, 1448 (2024). DOI:10.1038/s41467-024-45778-y

Documentation

Deepmod2 github repository

Important Notes

Module Name: deepmod2 (see the modules page for more information)
environment variables set
- DEEPMOD2_HOME
- DEEPMOD2_TEST_DATA
Example data files in $DEEPMOD2_TEST_DATA.

Interactive job

Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session (based on Deepmod2's tutorial):

[user@biowulf]$ sinteractive --mem=15g --cpus-per-task=8 --gres=gpu:v100x:1
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job
[user@cn3144 ~]$ module load deepmod2 dorado samtools minimap2

[user@cn3144 ~]$ cd /data/${USER}
[user@cn3144 ~]$ INPUT_DIR=data
[user@cn3144 ~]$ OUT_DIR=mod

[user@cn3144 ~]$ mkdir -pv ${INPUT_DIR}/nanopore_raw_data
[user@cn3144 ~]$ tar xzf ${DEEPMOD2_TEST_DATA}/sample.pod5.tar.gz -C ${INPUT_DIR}/nanopore_raw_data

[user@cn3144 ~]$ dorado basecaller --emit-moves --recursive \
                 ${DORADO_MODELS}/dna_r10.4.1_e8.2_400bps_hac@v4.3.0 \
                 ${INPUT_DIR}/nanopore_raw_data > ${OUTPUT_DIR}/basecalled.bam
[2025-06-05 09:26:27.940] [info] Running: "basecaller" "--emit-moves" "--recursive" "/fdb/dorado/0.9.6/dna_r10.4.1_e8.2_400bps_hac@v4.3.0" "data/nanopore_raw_data"
[2025-06-05 09:26:28.099] [info] Normalised: overlap 500 -> 498
[2025-06-05 09:26:28.099] [info] Normalised: chunksize 10000 -> 9996
[2025-06-05 09:26:28.099] [info] > Creating basecall pipeline
[2025-06-05 09:26:29.108] [info] Calculating optimized batch size for GPU "Tesla V100-SXM2-32GB" and model dna_r10.4.1_e8.2_400bps_hac@v4.3.0. Full benchmarking will run for this device, which may take some time.
[2025-06-05 09:28:05.773] [info] cuda:0 using chunk size 9996, batch size 3328
[2025-06-05 09:28:06.912] [info] cuda:0 using chunk size 4998, batch size 6784
[2025-06-05 09:28:12.714] [info] > Finished in (ms): 3608
[2025-06-05 09:28:12.714] [info] > Simplex reads basecalled: 59
[2025-06-05 09:28:12.714] [info] > Basecalled @ Samples/s: 7.746490e+06
[2025-06-05 09:28:12.714] [info] > Finished

[user@cn3144 ~]$ samtools fastq ${OUTPUT_DIR}/basecalled.bam -T "*" | \
                 minimap2 -ax map-ont \
                 /fdb/igenomes/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa - -y | \
                 samtools view -o ${OUTPUT_DIR}/aligned.bam
[M::mm_idx_gen::75.344*1.59] collected minimizers
[M::mm_idx_gen::94.013*1.86] sorted minimizers
[M::main::94.013*1.86] loaded/built the index for 195 target sequence(s)
[M::mm_mapopt_update::96.431*1.84] mid_occ = 694
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 195
[M::mm_idx_stat::97.809*1.83] distinct minimizers: 100167746 (38.80% are singletons); average occurrences: 5.519; average spacing: 5.607; total length: 3099922541
[M::bam2fq_mainloop] discarded 0 singletons
[M::bam2fq_mainloop] processed 61 reads
[M::worker_pipeline::99.833*1.81] mapped 61 sequences
[M::main] Version: 2.29-r1283
[M::main] CMD: minimap2 -ax map-ont -y /fdb/igenomes/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa -
[M::main] Real time: 100.028 sec; CPU: 180.762 sec; Peak RSS: 11.348 GB

[user@cn3144 ~]$ deepmod2 detect --seq_type dna --model bilstm_r10.4.1_5khz_v4.3 \
                 --file_type pod5 --bam mod/aligned.bam --input data/nanopore_raw_data \
                 --output mod/deepmod2/ \
                 --ref /fdb/igenomes/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa \
                 --threads 8
2025-06-05 10:45:20.736429: Starting Per Read Methylation Detection.
2025-06-05 10:45:20.770338: Getting motif positions from the reference.
2025-06-05 10:48:28.819651: Finished getting motif positions from the reference.
2025-06-05 10:48:28.890963: Building BAM index.
2025-06-05 10:48:28.924808: Finished building BAM index.
2025-06-05 10:48:30.101261: Reading inputs complete.
2025-06-05 10:48:51.120458: Model predictions complete. Wrapping up output.
2025-06-05 10:48:51.376787: Number of reads processed: 57
2025-06-05 10:48:51.376847: Finished Per-Read Methylation Output. Starting Per-Site output.
2025-06-05 10:48:51.376857: Modification Tagged BAM file: mod/deepmod2/output.bam
2025-06-05 10:48:51.376873: Per Read Prediction file: mod/deepmod2/output.per_read
2025-06-05 10:48:51.376888: Writing Per Site Methylation Detection.
2025-06-05 10:48:51.413912: Finished Writing Per Site Methylation Output.
2025-06-05 10:48:51.413942: Per Site Prediction file: mod/deepmod2/output.per_site
2025-06-05 10:48:51.413951: Aggregated Per Site Prediction file: mod/deepmod2/output.per_site.aggregated

2025-06-05 10:48:53.018012: Time elapsed=213.6859s

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job

Most jobs should be run as batch jobs.

Create a batch input file (e.g. deepmod2.sh). For example:

#!/bin/bash
#SBATCH --job-name=deepmod2
#SBATCH --gres=gpu:v100:1
#SBATCH --mem=16g
#SBATCH --cpus-per-task=8
#SBATCH --time=1:00:00

module load deepmod2 dorado samtools minimap2

cd /data/${USER}
INPUT_DIR=data
OUT_DIR=mod

mkdir -pv ${INPUT_DIR}/nanopore_raw_data
tar xzf ${DEEPMOD2_TEST_DATA}/sample.pod5.tar.gz -C ${INPUT_DIR}/nanopore_raw_data

dorado basecaller --emit-moves --recursive \
                 ${DORADO_MODELS}/dna_r10.4.1_e8.2_400bps_hac@v4.3.0 \
                 ${INPUT_DIR}/nanopore_raw_data > ${OUTPUT_DIR}/basecalled.bam

samtools fastq ${OUTPUT_DIR}/basecalled.bam -T "*" | \
                 minimap2 -ax map-ont \
                 /fdb/igenomes/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa - -y | \
                 samtools view -o ${OUTPUT_DIR}/aligned.bam

deepmod2 detect --seq_type dna --model bilstm_r10.4.1_5khz_v4.3 \
                 --file_type pod5 --bam mod/aligned.bam --input data/nanopore_raw_data \
                 --output mod/deepmod2/ \
                 --ref /fdb/igenomes/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa \
                 --threads 8

Submit this job using the Slurm sbatch command.

sbatch deepmod2.sh

Swarm of Jobs

A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. deepmod2.swarm). For example:

deepmod2 detect --seq_type dna --model bilstm_r10.4.1_5khz_v4.3 \
                 --file_type pod5 --bam mod/aligned_01.bam --input data/nanopore_raw_data \
                 --output mod/deepmod2/ \
                 --ref /fdb/igenomes/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa \
                 --threads 8
deepmod2 detect --seq_type dna --model bilstm_r10.4.1_5khz_v4.3 \
                 --file_type pod5 --bam mod/aligned_02.bam --input data/nanopore_raw_data \
                 --output mod/deepmod2/ \
                 --ref /fdb/igenomes/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa \
                 --threads 8
deepmod2 detect --seq_type dna --model bilstm_r10.4.1_5khz_v4.3 \
                 --file_type pod5 --bam mod/aligned_03.bam --input data/nanopore_raw_data \
                 --output mod/deepmod2/ \
                 --ref /fdb/igenomes/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa \
                 --threads 8

Submit this job using the swarm command.

swarm -f deepmod2.swarm [-g #] [-t #] --module deepmod2

where

-g #	Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t #	Number of threads/CPUs required for each process (1 line in the swarm command file).
--module deepmod2	Loads the deepmod2 module for each subjob in the swarm