TITAN – Installation

TITAN Home | Downloads | Installation | TITANRunner Pipeline | TitanCNA R package | Output | FAQ

Installation of TitanCNA R package

1. R version

R version 3.1 or higher is required. Also, because there are C-based code in the R package, a proper gcc compiler is required. For Mac machines, Xcode will be required. For Windows, installing Rtools should suffice.

2. Dependencies

Install these R package dependencies from within an R session before proceeding to the TITAN installation

i. HMMcopy suite 

TITAN requires several elements of the HMMcopy suite. Download and install the HMMcopy suite (BAMtools and R package) using cmake and Bioconductor.
TITAN uses HMMcopy to perform the following:

  1. Preprocessing of tumour and normal BAM files to extract read counts
  2. GC content correction

The Bioconductor component of HMMcopy can be installed from within R (required for TitanCNA v1.0.0; not required for TitanCNA v1.2.0 or greater),

~$ [R dir]/bin/R
source("http://bioconductor.org/biocLite.R")
biocLite("HMMcopy")

Please also download the HMMcopy suite as included pre-processing tools are required.

ii. foreach (R library; required)

Install the foreach R package from within R using

~$ [R dir]/bin/R
install.packages("foreach")

iii IRanges (required for TitanCNA v1.2.0+)

The IRanges (>=1.99.1) package is required for TitanCNA (v1.2.0). This version of IRanges is only available from Bioconductor v3.0 or greater.
To obtain this package, please download IRanges from the developer’s release (bioc 3.0) and install using this command.

# From the command line
~$ [R dir]/bin/R CMD INSTALL IRanges_1.99.15.tar.gz

iv Rsamtools (required for TitanCNA v1.2.0+)

The Rsamtools (>=1.17.11) package is required for TitanCNA (v1.2.0). This version of Rsamtools is only available from Bioconductor v3.0 or greater.
To obtain this package, please download Rsamtools from the developer’s release (bioc 3.0) and install using this command.

# From the command line
~$ [R dir]/bin/R CMD INSTALL Rsamtools_1.17.23.tar.gz

v GenomeInfoDb (required for TitanCNA v1.2.0+)

The GenomeInfoDb (>=1.1.3) package is required for TitanCNA (v1.2.0). This version of GenomeInfoDb is only available from Bioconductor v3.0 or greater.
To obtain this package, please download GenomeInfoDb from the developer’s release (bioc 3.0) and install using this command.

# From the command line
~$ [R dir]/bin/R CMD INSTALL GenomeInfoDb_1.1.6.tar.gz

vi doMC library (optional)

This is an optional library if users would like to use parallelization when running TITAN. doMC is a package for using multiple cores within a single machine via forking. Other libraries such as doMPI would also work if users would like to install on a grid engine supporting MPI.

~$ [R dir]/bin/R
install.packages("doMC")

3. TitanCNA

i TitanCNA v1.3.0

TITAN is implemented as an R package, TitanCNA (v1.3.0) This is available via stable release of Bioconductor (v3.0).

~$ [R dir]/bin/R
source("http://bioconductor.org/biocLite.R")
biocLite("TitanCNA")

ii TitanCNA v1.5.2

To obtain TitanCNA (v1.5.2), please download TitanCNA from the developer’s release (bioc 3.0) and install using

# From the command line
~$ [R dir]/bin/R CMD INSTALL TitanCNA_1.5.2.tar.gz

Note: TitanCNA v1.2.1+ has a critical bug fix related to memory usage (see version changes)

iii TitanCNA nightly build

To obtain the latest nightly build of TitanCNA, please checkout the source code from the Bioconductor SVN repo.

svn co https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/TitanCNA/
user: readonly
pass: readonly

iv TitanCNA GitHub repository

To obtain the latest development version, visit the TitanCNA GitHub repository at https://github.com/gavinha/TitanCNA.

Changes here will also be reflected in the Bioconductor nightly build and development versions.

See the changes to versions of TitanCNA below.


Installation of Python ruffus pipeline

1. Python version

or higher needs to be installed.

2. Python packages

These are the following required Python packages:

  1. ruffus (version 2.2 or higher)
  2. argparse
  3. pysam
  4. lockfile

3. 3rd-party tools

These are the following required 3rd-party tools:

  1. bcftools
  2. SAMtools (version 0.1.18 or higher)

4. Installing TITANRunner ruffus pipeline

Download the Python ruffus pipeline source code from the Downloads page.

Unzip the archive,

unzip TITANRunner-X.X.X.zip

See the TITANRunner Pipeline page for details on usage.


TitanCNA version changes

=================================================================================
TitanCNA version 1.5.5 changes (Committed revision XXXXX.)
=================================================================================
1) Modified function
– Affected function: loadDefaultParameters
– Added argument: data
– For symmetric=TRUE, the heterozygous states (4,9,25) are set to the median allelic ratio of the entire genome if data is given
– “data” is the output from loadAlleleCounts()
– This corrects the issue where the entire genome is mostly copy neutral LOH (NLOH) when it should most likely be copy neutral heterozygous (HET).

2) Modified function
– Affected function: outputModelParameters
– Argument removed: S_Dbw.data.type
– Will output S_Dbw for datatypes: LogRatio, AllelicRatio, and Both
– “Both” refers to the sum of both datatypes (LogRatio and AllelicRatio) for each of the dens.bw and scat components

=================================================================================
TitanCNA version 1.5.4 changes (Committed revision 96687.)
=================================================================================
1) Model selection default changed
– Affected function: outputModelParameters
– New value can be used for argument: S_Dbw.data.type = “Both”
– “Both” will sum the S_Dbw index for “AllelicRatio” and “LogRatio”; this will account for both datatypes when selecting the best solution
– outputModelParameters() will now return the S_Dbw information

2) Vignette additions
– Under section “Format and print results to file”
– added instructions for using S_Dbw scale to penalize higher number of clonal clusters

3) Updated references in manual pages
– Ha et al. (2014). Genome Research, 24: 1881-1893.

=================================================================================
TitanCNA version 1.5.3 changes (Committed revision 96468.)
=================================================================================
Note: This is the development version
1) Minor implementation change
– Affected function: viterbiClonalCN, runEMclonalCN
– Probabilities for transitioning into different clonal clusters was originally txnZstrength*txnExpLen and specified in the C code. It is now decoupled from the C code and specified in the R code, instead.

2) Vignette correction/additions
– Under section “Filter the data”, changed to:
– data <- filterData(data, c(1:22, “X”, “Y”),…
– Under section “Format and print results to file”
– added instructions for using S_Dbw scale to penalize higher number of clonal clusters

3) Bug fixes
– Affected function: plotSubcloneProfiles
– Whole genome profile plots of subclones (using chr=NULL) was not working properly, previously.

=================================================================================
TitanCNA version 1.5.2 changes (Committed revision 95883.)
=================================================================================
Note: Bioconductor v3.1 (development)
1) Modified functions
– Affected function: getPositionOverlap
– Now uses RangedData objects and findOverlaps utility function internally
– getPositionOverlap is now much faster than before

=================================================================================
TitanCNA version 1.5.1 changes (Committed revision 95701.)
=================================================================================
Note: Bioconductor v3.1 (development)
1) Added parameter arguments
– Affected function: loadAlleleCounts
– argument “header” added so users can indicated if input counts file contains a header line
2) Error checking
– Affected function: loadAlleleCounts
– Added file format checking to ensure chromosome coordinates and read counts are integers. (note chromosomes are still characters to accommodate X and Y)

=================================================================================
TitanCNA version 1.3.0 changes (Committed revision 93417.)
=================================================================================
1) Changed default parameters
– Affected function: loadDefaultParameters
– genotypeParams$rt is changed from 0.08 to 0.05 logR noise scalar to heterozygous states
2) Added new functionality and options to computation of S_Dbw validity index
– Affected public function: computeSDbwIndex
– Added argument “data.type”
– data.type can be set to “LogRatio” (default) or “AllelicRatio”
– For samples that have stronger alteration signals in allelic ratio data compared to log ratio data can use “AllelicRatio” to compute S_Dbw index for model selection
– Changed argument: ‘method’ to ‘centroid.method’
– Added functionality of choosing between “Halkidi” or “Tong” S_Dbw method
– New argument ‘S_Dbw.method’ can be set to “Halkidi” or “Tong” (default)
– References:
– Halkidi and Vazirgiannis (2001). Clustering Validity Assessment: Finding the Optimal Partition of a Data Set
– Tong and Tan (2009) Cluster validity based on the improved S_Dbw index
– Note that for Tong’s method, the computation of SCAT(c) is likely incorrect. This function uses ni/N instead of (N-ni)/N.
3) Added and updated arguments to outputModelParameters() function
– Affected public function: outputModelParameters
– Added arguments S_Dbw.data.type (default ‘LogRato’), S_Dbw.scale (default 1), S_Dbw.method (default “Tong”).

=================================================================================
TitanCNA version 1.2.1 changes (Committed revision 92909.)
=================================================================================
1) Critical bug fix for memory usage
– Affected public functions: runEMclonalCN
– bug in memory management in C implementation of logsumexp for forwards-backwards algorithm
2) Fixed bug for decoding TITAN state when symmetric = FALSE
– Affected public functions: loadAlleleCounts, runEMclonalCN, outputTitanResults
– loadDefaultParameters now returns list element “symmetric”. This is then propagated to being a list element in the output of the runEMclonalCN function. In turn, this output is required in the function outputTitanResults. outputTitanResults now requires “symmetric” element from convergeParams argument to determine whether the allelic ratio data was analyzed under symmetric mode.
3) Changed default parameters
– Affected functions: loadDefaultParameters
– genotypeParams$alphaKHyper initialized to 15,000 prior counts for all states
4) Updated Vignette with Subclone profile output
– Affected function: outputTitanResults
– Additional argument, “subcloneProfiles” to indicate whether to output subclone profiles. This only works for 1 or 2 clonal clusters in the solution
– New function: plotSubcloneProfiles
– Plots the copy number profile for the predicted subclones. Only works for 1 or 2 clonal clusters in the solution

=================================================================================
TitanCNA version 1.2.0 changes (Committed revision 91120.)
=================================================================================
1) Added new functionality for extracting read counts from BAM files
– New function: extractAlleleReadCounts
2) Added support for conversion between UCSC and NCBI chromosomes in input files
– Affected functions: loadAlleleCounts, correctReadDepth
3) Changed default parameters
– Affected functions: loadDefaultParameters
– genotypeParams$rt now has a 0.08 logR noise scalar added to heterozygous states
– genotypeParams$alphaKHyper initialized to 5,000 prior counts for all states; extreme states (HOMD and 4 copies or higher) are initialized to 15,000.
4) Forwards-Backwards Algorithm now computes the posterior in log space
– Affects: runEMclonalCN, fwd_backC_clonalCN.c
– Now enables users to use coarser segmentation settings (txnExpLen and txnZstrength arguments in runEMclonalCN)
5) Added new functionality for interpreting subclone profiles
– Affects: outputTitanResults
– For 2 clonal cluster solutions, will return 2 subclone profiles appended to results data.frame
– New function: plotSubcloneProfiles
– Plots the 2 subclone profiles
6) Fixed bug for interpreting the final 24th state (when symmetric=TRUE)
– Affected functions (private): decodeLOH