Tumor Heterogeneity Analysis (THetA) is an algorithm used to estimate tumor purity and clonal/subclonal copy number aberrations simultaneously from high-throughput DNA sequencing data.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=14g --cpus-per-task=6
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job
[user@cn3144 ~]$ module load theta/0.7-20-g94fd772
[user@cn3144 ~]$ cp -r ${THETA_TEST_DATA:-none}/example/ .
[user@cn3144 ~]$ RunTHetA example/Example.intervals \
--NUM_PROCESSES=$((SLURM_CPUS_PER_TASK - 1)) \
--TUMOR_FILE example/TUMOR_SNP.formatted.txt \
--NORMAL_FILE example/NORMAL_SNP.formatted.txt
=================================================
Arguments are:
Query File: example/Example.intervals
k: 3
tau: 2
Output Directory: ./
Output Prefix: Example
Num Processes: 5
Graph extension: .pdf
Valid sample for THetA analysis:
Ratio Deviation: 0.1
Min Fraction of Genome Aberrated: 0.05
Program WILL cluster intervals.
=================================================
Reading in query file...
[...snip...]
[user@cn3144 ~]$ ls -lh
total 232K
drwxr-xr-x 2 user group 4.0K Oct 13 10:32 example
drwxr-xr-x 9 user group 4.0K Oct 13 16:19 Example_2_cluster_data
drwxr-xr-x 8 user group 4.0K Oct 13 16:20 Example_3_cluster_data
-rw-r--r-- 1 user group 13K Oct 13 16:19 Example_assignment.png
-rw-r--r-- 1 user group 2.0K Oct 13 16:20 Example.BEST.results
-rw-r--r-- 1 user group 118K Oct 13 16:19 Example_by_chromosome.png
-rw-r--r-- 1 user group 17K Oct 13 16:19 Example_classifications.png
-rw-r--r-- 1 user group 16K Oct 13 16:19 Example.n2.graph.pdf
-rw-r--r-- 1 user group 2.0K Oct 13 16:19 Example.n2.results
-rw-r--r-- 1 user group 3.6K Oct 13 16:19 Example.n2.withBounds
-rw-r--r-- 1 user group 17K Oct 13 16:20 Example.n3.graph.pdf
-rw-r--r-- 1 user group 2.2K Oct 13 16:20 Example.n3.results
-rw-r--r-- 1 user group 3.6K Oct 13 16:20 Example.n3.withBounds
-rw-r--r-- 1 user group 225 Oct 13 16:19 Example.RunN3.bash
The analysis will create a number of files including some graphs. For example, the following shows one of the models (2 components):
In addition to RunTHetA there are several other tools included
in this package
[user@cn3144 ~]$ ls /usr/local/apps/theta/0.7-20-g94fd772/bin |-- CreateExomeInput |-- getAlleleCounts |-- runBICSeqToTHetA `-- RunTHetA
2 of these tools (getAlleleCounts and runBICSeqToTHetA) are wrappers around
java tools. In addition to their normal arguments they also take the --java-opts
argument which can be used to pass options to java
[user@cn3144 ~]$ runBICSeqToTHetA --java-opts="-Xmx2g" --help
Error! Incorrect number of arguments.
Program: BICSeqToTHetA
USAGE (src): java BICSeqToTHetA <INPUT_FILE> [Options]
USAGE (jar): java -jar BICSeqToTHetA <INPUT_FILE> [Options]
<INPUT_FILE> [String]
A file output by BIC-Seq.
-OUTPUT_PREFIX [STRING]
Prefix for all output files.
-MIN_LENGTH [Integer]
The minimum length of intervals to keep.
For a more detailed manual see
/usr/local/apps/theta/<version>/MANUAL.txt
[user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. THetA.sh), which uses the input file 'THetA.in'. For example:
#! /bin/bash module load theta/0.7-20-g94fd772 || exit 1 RunTHetA example/Example.intervals \ --NUM_PROCESSES=$((SLURM_CPUS_PER_TASK - 1)) \ --TUMOR_FILE example/TUMOR_SNP.formatted.txt \ --NORMAL_FILE example/NORMAL_SNP.formatted.txt
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=6 --mem=14g theta.sh
Create a swarmfile (e.g. THetA.swarm). For example:
RunTHetA sample1/Example.intervals --NUM_PROCESSES=$((SLURM_CPUS_PER_TASK-1)) \ --TUMOR_FILE sample1/TUMOR_SNP.formatted.txt --NORMAL_FILE sample2/NORMAL_SNP.formatted.txt RunTHetA sample2/Example.intervals --NUM_PROCESSES=$((SLURM_CPUS_PER_TASK-1)) \ --TUMOR_FILE sample2/TUMOR_SNP.formatted.txt --NORMAL_FILE sample2/NORMAL_SNP.formatted.txt
Submit this job using the swarm command.
swarm -f THetA.swarm -g 14 -t 6 --module thetawhere
| -g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
| -t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
| --module THetA | Loads the THetA module for each subjob in the swarm |