Research
Our research group develops computational methods for the analysis of high throughput cancer genomics data. We use machine learning techinques to develop statistical models to infer genomic abnormalities from next generation sequencing data and high-density genotyping arrays. Our group works in conjunction with Dr. David Huntsman and Dr. Sam Aparicio who lead ovarian cancer and breast cancer research respectively at the BC Cancer Agency. We focus on the analysis of clinical samples with the goal of detecting genetic abnormalities in tumour genome that might be developed into diagnostics, prognostics or novel therapeutic targets. See our publications page for recent papers. Most of our software is released under an open source license and is free to download.
MutationSeq paper on feature based classifiers for somatic mutation detection published
Jiarui Ding; Ali Bashashati; Andrew Roth; Arusha Oloumi; Kane Tse; Thomas Zeng; Gholamreza Haffari; Martin Hirst; Marco A. Marra; Anne Condon; Samuel Aparicio; Sohrab P. Shah. “Feature based classifiers for somatic mutation detection in tumour-normal paired sequencing data.”

SNVMix
Detecting single nucleotide variants from next generation sequencing data
SNVMix is designed to detect single nucleotide variants from next generation sequencing data. SNVMix is a post-alignment tool. Given a pileup file (either Maq or Samtools format) as input and model parameters, SNVMix will output the probability that each position is one of three genotypes: aa (homozygous for the reference allele, where the reference is the genome the reads were aligned to), ab (heterozygous) and bb (homozygous for a non-reference allele). A tool for fitting the model using expectation maximization is also supplied (use -T option).
Download
The source code implemented in C is available for distribution under an open source license. Supported platforms are Linux and Mac OS X. A working gcc compiler is needed and under Linux libc >= 4.6.27 is required.
Download latest version: SNVMix2-0.11.8-r4.tar.gz.
Changes:
- Fixed a parsing problem present when generating pileups for RNA-Seq data with samtools > 0.1.8.
- Error reporting when number of columns in pileup file is wrong.
Notes for version 0.11.8:
Alpha and beta parameters can now be specified on the command line for training using three new flags:
-a #,#,# Provide alpha training parameters -b #,#,# Provide beta training parameters -d #,#,# Provide delta training parameters
You can also specify training parameters in a space-separated file using:
-M Provide a file containing training parameters
It is also recommended you update to this due to a bug fix. Older versions affected by that bug presented sporadic segmentation faults when dealing with model files.
Installation
> tar -xzvf SNVMix2-0.11.8-r4.tar.gz > cd SNVMix2-0.11.8-r4/ > make > ./SNVMix2 -h
Model parameter file
In the absence of training data, a model file (input with -m) containing the mu and pi parameters of the model derived in the Shah et al , Nature (2009) is provided here:
References:
If you use SNVMix in your work, please cite the following papers:
Sohrab P. Shah, Ryan D. Morin, Jaswinder Khattra, Leah Prentice, Trevor Pugh, Angela Burleigh, Allen Delaney, Karen Gelmon, Ryan Giuliany, Janine Senz, Christian Steidl, Robert A. Holt, Steven Jones, Mark Sun, Gillian Leung, Richard Moore, Tesa Severson, Greg A. Taylor, Andrew E. Teschendorff, Kane Tse, Gulisa Turashvili, Richard Varhol, Rene L. Warren, Peter Watson, Yongjun Zhao, Carlos Caldas, David Huntsman, Martin Hirst, Marco A. Marra and Samuel Aparicio. Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution. Nature. vol461, 809-813. (2009) [PDF]
Goya R, Sun MG, Morin RD, Leung G, Ha G, Wiegand KC, Senz J, Crisan A, Marra MA, Hirst M, Huntsman D, Murphy KP, Aparicio S, Shah SP. SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors. Bioinformatics . 2010 Mar 15;26(6):730-6. [Link]
Comments and/or questions should be directed to Sohrab Shah <sshah@bccrc.ca> and Rodrigo Goya <rgoya@bcgsc.ca> .
JointSNVMix software released
Description
JointSNVMix implements a probabilistic graphical model to analyse sequence data from tumour/normal pairs. The model draws statistical strength by analysing both genome jointly to more accurately classify germline and somatic mutations.
Latest Release v0.6.0
Discovery of CIITA gene fusions in lymphoid cancers published in Nature.
Today, our discovery that CIITA is a recurrent gene fusion partner in lymphoid cancers appeared in Nature. Here is a link to the article.
Full citation:
Christian Steidl*, Sohrab P. Shah*, Bruce Woolcock, Pedro Farinha, Nathalie A. Johnson, Yongjun Zhao, Adele Telenius, Susana Ben Neriah, Arjan Diepstra, Anke van den Berg, Mark Sun, Gillian Leung, Joseph M. Connors, David G. Huntsman, Kerry J. Savage, Lisa Rimsza, Douglas E. Horsman, Marco A. Marra and Randy D. Gascoyne. MHC class II transactivator CIITA is a recurrent gene fusion partner in lymphoid cancers. Nature doi:10.1038/nature09754
Mutational evolution in breast cancer study published in Nature
From the Nature website:
Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution p809
Advances in next generation sequencing have made it possible to precisely characterize the coding mutations that occur during the development and progression of individual cancers. Here, this technique is used to sequence the genomes and transcriptomes of an oestrogen-receptor-alpha-positive metastatic lobular breast cancer; significant evolution is found to occur with disease progression.
News and Media: [CTV] | [CBC] | [Globe and Mail]
Blogs: Omics! Omics!




