VSEARCH supports de novo and reference based chimera detection, clustering, full-length and prefix dereplication, rereplication, reverse complementation, masking, all-vs-all pairwise global alignment, exact and global alignment searching, shuffling, subsampling and sorting. It also supports FASTQ file analysis, filtering, conversion and merging of paired-end reads.
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf ~]$ sinteractive --mem=4G --gres=lscratch:5 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load vsearch [user@cn3144 ~]$ cd /lscratch/$SLURM_JOB_ID [user@cn3144 46116226]$ cp $VSEARCH_EXAMPLES BioMarKs50k.fsa . [user@cn3144 46116226]$ vsearch --threads 2 --cluster_fast BioMarKs50k.fsa --id 0.97 --centroids vsearch.out Reading file BioMarKs50k.fsa 100% 19073093 nt in 49958 seqs, min 32, max 497, avg 382 minseqlength 32: 42 sequences discarded. Masking 100% Sorting by length 100% Counting k-mers 100% Clustering 100% Sorting clusters 100% Writing clusters 100% Clusters: 4301 Size min 1, max 932, avg 11.6 Singletons: 1856, 3.7% of seqs, 43.2% of clusters [user@cn3144 46116226]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. vsearch.sh). For example:
#!/bin/bash module load vsearch vsearch --threads $SLURM_CPUS_PER_TASK --usearch_global queries.fsa --db database.fsa --id 0.9 --alnout alnout.txt
Submit this job using the Slurm sbatch command, replacing # with appropriate values.
sbatch --cpus-per-task=# --mem=# vsearch.sh
Create a swarmfile (e.g. vsearch.swarm). For example:
vsearch --threads $SLURM_CPUS_PER_TASK --usearch_global queries1.fsa --db database.fsa --id 0.9 --alnout alnout1.txt vsearch --threads $SLURM_CPUS_PER_TASK --usearch_global queries2.fsa --db database.fsa --id 0.9 --alnout alnout2.txt vsearch --threads $SLURM_CPUS_PER_TASK --usearch_global queries3.fsa --db database.fsa --id 0.9 --alnout alnout3.txt vsearch --threads $SLURM_CPUS_PER_TASK --usearch_global queries4.fsa --db database.fsa --id 0.9 --alnout alnout4.txt
Submit this job using the swarm command.
swarm -f vsearch.swarm [-g #] [-t #] --module vsearchwhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module vsearch | Loads the VSEARCH module for each subjob in the swarm |