APOLLOH
APOLLOH v0.1.0 is a hidden Markov model (HMM) for predicting somatic loss of heterozygosity and allelic imbalance in whole tumour genome sequencing data.
Please feel free to contact me if you have any questions regarding this software,
Gavin Ha (gha [at] bccrc [dot] ca)
Last modified: February 14, 2012 at 4:28pm
Downloads
Installation Instructions
- Download and extract APOLLOH_0.1.1 into the desired folder
<$install_dir>
cd <$install_dir>
tar xvzf APOLLOH_0.1.0.tar.gz - Install MATLAB Component Runtime (MCR). Note that the compilation was originally done in Linux, hence the MCR installation must be for 64-bit Linux (glnxa64) architecture and version 77. You can download MCRInstaller.bin here.
You will need to specify the directory for this installation<$install_dir>/<$mcr_dir>/ - Alternatively, if you have Matlab installed with the MCR compiler toolbox, then you can compile the software to work on your machine’s architecture. Simply feed the script,
compileAPOLLOH_0.1.0.minto the Matlab executable command.
cd <$install_dir>/APOLLOH_0.1.0/bin/
<$matlab_dir>/bin/matlab -nodesktop < compileAPOLLOH_0.1.0.m
Running the compiled software
There are 2 ways to run the compiled software: 1) executable or 2) shell script. These options are offered by the Matlab as a result of using the MCR compiler. If you have MCR already installed and added to your path (specifically the LD_LIBRARY_PATH environment variable) then you can use the executable; otherwise, use the shell script as it allows you to manually specify the MCR install path.
> # In Linux
> cd ../bin/
> ### Run APOLLOH ###
> ./run_apolloh.sh ../MATLAB_Component_Runtime/v77/ \
<$input_allelic_ratio_datafile> \
<$input_copy_number_datafile> \
../parameters/K18/apolloh_K18_params_Illumina_stromalRatio_Hyper10k.mat \
<$output_converged_parameter_file> <$output_results_file>
> ### Sort data by chromosome and position ###
> sort -k1,1n -k2,2n <$output_results_file> tmp.txt
> mv tmp.txt <$output_results_file>
> ### Generate segment file ###
> ../scripts/createSingleSegFileFromAPOLLOH.pl -i <$output_results_file> \
-o <$output_segment_results_file> -calls 1;
Running the software in Matlab
If you have Matlab installed and wish to run within the Matlab environment, then you can start up Matlab and add the source files before executing the main function.
> # In Linux
> cd <$install_dir>/APOLLOH_0.1.0/bin/
> <$matlab_dir>/bin/matlab
>> % In Matlab
>> addpath(genpath("<$install_dir>/APOLLOH_0.1.0/hmm"))
>> addpath(genpath("<$install_dir>/APOLLOH_0.1.0/util"))
>> % assuming your have all the arguments to the main function assigned...
>> apolloh(infile,cnfile,paramset,outparam,outfile)
Inputs and outputs
When using the compiled executable or running within the Matlab environment, the same input/output files and parameters are required:
Function apolloh(infile,cnfile,paramset,outparam,outfile)
INPUTS:
infile Tab-delimited input file containing allelic counts
from the tumour at positions determined as heterozygous
from the normal genome.
6 columns:
1) chr
2) position
3) reference base
4) referenc count
5) non-reference base
6) non-reference count
cnfile Tab-delimited input copy number segment prior file.
The accepted format is the output from HMMcopy,
a read-depth for analyzing copy number in tumour-
normal sequenced genomes.
8-columns:
1) id (can be arbitrary, not used)
2) chr
3) start
4) stop
5) Number of 1kb intervals (can be arbitrary; not used)
6) median log2 ratio (normal and tumour) for segment
7) HMM state: 1=HOMD, 2=HEMD, 3=NEUT, 4=GAIN,
5=AMP, 6=HLAMP
8 ) CN state
If string '0' is used, then copy number of 2 (diploid) is used.
paramset Parameter intialization file is a matlab binary (.mat) file.
This file contains model and setting paramters necessary
to run the program.
See examples in "<$install_dir>/parameters/".
outfile Tab-delimited output file for position-level results.
9-columns:
1) chr
2) position
3) reference count
4) non-reference count
5) total depth
6) allelic ratio
7) copy number (from input)
8 ) APOLLOH genotype state
9) Zygosity state.
N additional columns:
posterior marginal probabilities (responsibilities) for
each APOLLOH genotype state.
Segment boundaries are determined as consecutive
marginal states of DLOH, NLOH, ALOH, HET, BCNA,
ASCNA; this implementation does not output this
information. An external Perl script handles this: "
<$install_dir>/scripts/analysis/createSingleSegFileFromAPOLLOH.pl"
outparam Tab-delimited output file storing converged parameters
after model training using Expectation Maximization (EM)
algorithm.
1) Number of iterations
2) Global normal contamination parameter
3) Binomial parameters for each HMM class/state.

