APOLLOH

APOLLOH v0.1.0 is a hidden Markov model (HMM) for predicting somatic loss of heterozygosity and allelic imbalance in whole tumour genome sequencing data.

Please feel free to contact me if you have any questions regarding this software,
Gavin Ha (gha [at] bccrc [dot] ca)

Last modified: February 14, 2012 at 4:28pm

Downloads

APOLLOH_0.1.0.tar.gz

MCRInstaller.bin

Installation Instructions

  1. Download and extract APOLLOH_0.1.1 into the desired folder <$install_dir>
    cd <$install_dir>
    tar xvzf APOLLOH_0.1.0.tar.gz
  2. Install MATLAB Component Runtime (MCR). Note that the compilation was originally done in Linux, hence the MCR installation must be for 64-bit Linux (glnxa64) architecture and version 77. You can download MCRInstaller.bin here.
    You will need to specify the directory for this installation <$install_dir>/<$mcr_dir>/
  3. Alternatively, if you have Matlab installed with the MCR compiler toolbox, then you can compile the software to work on your machine’s architecture. Simply feed the script, compileAPOLLOH_0.1.0.m into the Matlab executable command.

    cd <$install_dir>/APOLLOH_0.1.0/bin/
    <$matlab_dir>/bin/matlab -nodesktop < compileAPOLLOH_0.1.0.m 

     

Running the compiled software

There are 2 ways to run the compiled software: 1) executable or 2) shell script. These options are offered by the Matlab as a result of using the MCR compiler. If you have MCR already installed and added to your path (specifically the LD_LIBRARY_PATH environment variable) then you can use the executable; otherwise, use the shell script as it allows you to manually specify the MCR install path.

> # In Linux
> cd ../bin/
> ### Run APOLLOH ###


> ./run_apolloh.sh ../MATLAB_Component_Runtime/v77/ \
<$input_allelic_ratio_datafile> \
<$input_copy_number_datafile> \
../parameters/K18/apolloh_K18_params_Illumina_stromalRatio_Hyper10k.mat \
<$output_converged_parameter_file> <$output_results_file>


> ### Sort data by chromosome and position ###
> sort -k1,1n -k2,2n <$output_results_file> tmp.txt
> mv tmp.txt <$output_results_file>


> ### Generate segment file ###
> ../scripts/createSingleSegFileFromAPOLLOH.pl -i <$output_results_file> \
-o <$output_segment_results_file> -calls 1;

 

Running the software in Matlab

If you have Matlab installed and wish to run within the Matlab environment, then you can start up Matlab and add the source files before executing the main function.

> # In Linux
> cd <$install_dir>/APOLLOH_0.1.0/bin/
> <$matlab_dir>/bin/matlab
>> % In Matlab
>> addpath(genpath("<$install_dir>/APOLLOH_0.1.0/hmm"))
>> addpath(genpath("<$install_dir>/APOLLOH_0.1.0/util"))
>> % assuming your have all the arguments to the main function assigned...
>> apolloh(infile,cnfile,paramset,outparam,outfile)

 

Inputs and outputs

When using the compiled executable or running within the Matlab environment, the same input/output files and parameters are required:


Function apolloh(infile,cnfile,paramset,outparam,outfile)

INPUTS:  
infile         Tab-delimited input file containing allelic counts
                 from the tumour at positions determined as heterozygous
                 from the normal genome. 
                6 columns:
                    1) chr
                    2) position
                    3) reference base
                    4) referenc count
                    5) non-reference base
                    6) non-reference count 

cnfile         Tab-delimited input copy number segment prior file.
                  The accepted format is the output from HMMcopy,
                  a read-depth for analyzing copy number in tumour-
                  normal sequenced genomes.                  
                 8-columns:
                    1) id (can be arbitrary, not used)
                    2) chr
                    3) start  
                    4) stop
                    5) Number of 1kb intervals (can be arbitrary; not used)
                    6) median log2 ratio (normal and tumour) for segment
                    7) HMM state: 1=HOMD, 2=HEMD, 3=NEUT, 4=GAIN,
                        5=AMP, 6=HLAMP
                    8 ) CN state
                  If string '0' is used, then copy number of 2 (diploid) is used.   

paramset       Parameter intialization file is a matlab binary (.mat) file.
                      This file contains model and setting paramters necessary
                      to run the program.
                      See examples in "<$install_dir>/parameters/".

outfile        Tab-delimited output file for position-level results. 
                  9-columns:
                     1) chr
                     2) position
                     3) reference count
                     4) non-reference count
                     5) total depth
                     6) allelic ratio
                     7) copy number (from input)
                     8 ) APOLLOH genotype state
                     9) Zygosity state.
                  N additional columns:
                     posterior marginal probabilities (responsibilities) for
                     each APOLLOH genotype state. 

                  Segment boundaries are determined as consecutive
                   marginal states of DLOH, NLOH, ALOH, HET, BCNA,
                   ASCNA; this implementation does not output this
                   information. An external Perl script handles this: "
                   <$install_dir>/scripts/analysis/createSingleSegFileFromAPOLLOH.pl"

outparam       Tab-delimited output file storing converged parameters
                       after model training using Expectation Maximization (EM)
                       algorithm.
                     1) Number of iterations 
                     2) Global normal contamination parameter 
                     3) Binomial parameters for each HMM class/state.