bcl-convert on Biowulf

Illumina's bcl-convert is the (future) successor to bcl2fastq. The application converts Binary Base Call (BCL) files produced by Illumina sequencing systems to FASTQ files. bcl-convert also provides adapter handling (through masking and trimming) and UMI trimming and produces metric outputs.

The current setup of bcl-convert on Biowulf requires an exclusive node to run without overloading a compute node. Additionally certain options have been preset and setting them will cause an error. See Important Notes below!

Documentation
Important Notes

bcl-convert Options

Do NOT set the following options when running bcl-convert:

sbatch/sinteractive Options

You MUST set the following sbatch/sinteractive options as described below.

OptionExplanation/Howto
--exclusiveThe node must be allocated exclusively, else your bcl-convert process will overload CPUs and be inefficient/run slower.
--constraintThe number of CPUs on the allocated node must be known, so that bcl-convert will run the correct number of threads. To determine this, use the freen command to find the different types of nodes and select one type. (example below)
--cpus-per-task Must be set to the number of CPUs on the node type you are requesting.
--gres=lscratchOptional, bcl-convert will write temporary logs in lscratch. Additionally using lscratch to write output may be beneficial. See example session below
--memOptional, set to all the available memory on the type of node you are requesting.

Example session to choose parameters

biowulf% freen
                                                    .......Per-Node Resources......
Partition    FreeNds      FreeCPUs           Cores  CPUs  GPUs    Mem   Disk Features
-------------------------------------------------------------------------------------------------------
norm         0 / 118    1478 / 8496          36       72         369g  3200g cpu72,core36,g384,ssd3200,x6140,ibhdr100
norm         0 / 72     1786 / 5184          36       72         369g  3200g cpu72,core36,g384,ssd3200,x6240,ibhdr100
norm        10 / 397    5796 / 22232         28       56         243g   800g cpu56,core28,g256,ssd800,x2680,ibfdr
norm         0 / 5       216 / 280           28       56         243g  1800g cpu56,core28,g256,ssd1800,x2680,ibfdr
[...]
freen reports that there are 'cpu56' (56 CPUs) nodes available. Thus, to submit to a 56-cpu node (243 GB of RAM), your sbatch or sinteractive command would have the parameters:
--exclusive --constraint=cpu56 --cpus-per-task=56 --mem=243g --gres=lscratch:400
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

[user@biowulf]$ sinteractive --exclusive --constraint=cpu56 --cpus-per-task=56 --mem=243g --gres=lscratch:400
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ cd /lscratch/$SLURM_JOBID

[user@cn3144 ~]$ mkdir sample_bclconvert_output

[user@cn3144 ~]$ module load bcl-convert

[user@cn3144 ~]$ bcl-convert --bcl-input-directory /data/$USER/sample-run \
                          --output-directory sample_bclconvert_output
Index Read 2 is marked as Reverse Complement in RunInfo.xml: The barcode and UMI outputs will be output in Reverse Complement of Sample Sheet inputs.
Sample sheet being processed by common lib? Yes
SampleSheet Settings:
  AdapterRead1 = CAAGCAGAAGACGGCATACGAGAT
  AdapterRead2 = CAAGCAGAAGACGGCATACGAGAT
  FastqCompressionFormat = gzip
  SoftwareVersion = 3.7.4

shared-thread-linux-native-asio output is disabled
bcl-convert Version 00.000.000.3.9.3
Copyright (c) 2014-2018 Illumina, Inc.
...
[user@cn3144 ~]$ mv sample_bclconvert_output /data/$USER/

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. bcl-convert.sh). For example:

#!/bin/bash
set -e
mkdir -p /lscratch/$SLURM_JOBID/sample-output
module load bcl-convert
bcl-convert --bcl-input-directory sample-run --output-directory /lscratch/$SLURM_JOBID/sample-output
mv /lscratch/$SLURM_JOBID/sample-output /data/$USER/

Submit this job using the Slurm sbatch command.

sbatch --exclusive --constraint=cpu56 --cpus-per-task=56 --mem=243g --gres=lscratch:400 bcl-convert.sh