mutationSeq


mutationSeq is a software suite using feature-based classifiers for somatic mutation prediction from paired tumour/normal next-generation sequencing data. mutationSeq has the advantages of integrating different features (e.g., base qualities, mapping qualities, strand bias, and tailed distance features), and validated somatic mutations to make predictions. Given paired normal/tumour bam files, mutationSeq will output the probability of each candidate site being somatic.

Publications

Jiarui Ding; Ali Bashashati; Andrew Roth; Arusha Oloumi; Kane Tse; Thomas Zeng; Gholamreza Haffari; Martin Hirst; Marco A. Marra; Anne Condon; Samuel Aparicio; Sohrab P. Shah. “Feature based classifiers for somatic mutation detection in tumour-normal paired sequencing data.” Bioinformatics 2011; doi: 10.1093/bioinformatics/btr629 [PDF]

Download

Download the first version: mutationSeq_v1.tar.gz.

Installation

  1. Download and extract mutationSeq_v1.tar.gz into the desired folder <$install_dir>
    cd <$install_dir>
    tar xvzf mutationSeq_v1.tar.gz
  2. Supported operating systems
    mutationSeq has been tested on Linux 64bit machines with Matlab 2007b and Matlab 2010b. It may also be able to run on Mac machines.
  3. Prerequisites and dependencies
    Currently, mutationSeq needs several third-party software: Samtools-0.1.16, GATK v1.0.5543M. The pre-compiled versions for Linux 64bit systems are in mutationSeq/toolbox/. randomforest-matlab v0.02, BayesTree v0.3-1.1, libsvm-mat-3.0-1, and liblinear-1.7. The pre-compiled versions for Linux 64bit systems are in mutationSeq/classifier.
  4. Configuration/parameter settings
    The hyper-parameters of classifiers can be set by editing the file mutationSeq/classifier/para.txt. Currently, the parameters are as follows:
    RF,m=1000,p=2
    SVM,c=0.003906
    Logit,rho=64
    BART,m=100,k=2
  5. A demo file shows how to run mutationSeq
    To demonstrate how to run mutationSeq, a demo file is given in /mutationSeq/demo/lobular/demo_callmutation_from_bam.m. This demo shows how to run mutationSeq given paired normal/tumour bam files, and a list of candidate somatic mutations. mutationSeq will output the probability of each candidate site being somatic.
  6. Run the compiled software
    1. It’s easy to convert the matlab files to C files, and then to generate executable files if you have Matlab installed. For example, if you want to compile the prediction function to an executable file, just run
      cd classifier
      addpath(genpath(pwd))
      cd ../util
      addpath(pwd)
      cd ../bin
      mcc -m mutationseq_predict.m -R nodisplay

      Then the executable file can be run as follows (assuming the reference genome is in $ref_genome_dir ):

      $install_dir/bin/mutationseq_predict $install_dir/data/Lobular_normal_srt.bam $install_dir/data/Lobular_tumour_srt.bam $install_dir/data/Lobular_list_srt.txt $install_dir/demo/lobular/model.mat $ref_genome_dir/human_all.fasta 75 $install_dir

      Some common mistakes: the list file should be sorted and repeated entries in the file should be removed, no blank lines at the beginning or at the end of the list file.
    2. If you don’t have Matlab installed, please download MATLAB Component Runtime (MCR): MCRInstaller.bin, and extract it to <$install_mcr_dir>. Then run the following code:

      cd $install_mcr_dir
      ./MCRInstaller.bin

      To run the compiled functions:

      cd $install_dir/bin

      $install_dir/bin/run_mutationseq_predict.sh $install_mcr_dir/v77/ $install_dir/data/Lobular_normal_srt.bam $install_dir/data/Lobular_tumour_srt.bam $install_dir/data/Lobular_list_srt.txt $install_dir/demo/lobular/model.mat $ref_genome_dir/human_all.fasta 75 $install_dir

      Note:This time the first parameter is $install_mcr_dir/v77/. More over, we run the shell script run_mutationseq_predict.sh instead of directly running the executable file mutationseq_predict

Reproduce the cross-validation and test results from Ding et al “Feature based classifiers for somatic mutation detection in tumour-normal paired sequencing data”

In the sub-folder: mutationSeq/demo/paper_demo/, there is a file demo_cv.m, run this file will regenerate Figure 1(a) of the manuscript, and run demo_test.m will regenerate Figure 2 of the manuscript. For these two demos, the feature vectors were pre-computed and in mutationSeq/feature. The column header description is in mutationSeq/feature/README. Note because of the stochastic nature of Random Forest and Bayesian additive regression tree model, the outputs may be slightly different.


MutationSeq paper on feature based classifiers for somatic mutation detection published

Jiarui Ding; Ali Bashashati; Andrew Roth; Arusha Oloumi; Kane Tse; Thomas Zeng; Gholamreza Haffari; Martin Hirst; Marco A. Marra; Anne Condon; Samuel Aparicio; Sohrab P. Shah. “Feature based classifiers for somatic mutation detection in tumour-normal paired sequencing data.” Bioinformatics 2011; doi: 10.1093/bioinformatics/btr629 [PDF] [SOFTWARE]