Cancer driver mutations control outcomes indirectly through intermediate phenotypes, e.g., gene expression and protein expression. xseq is a probabilistic model which aims to encode the impact of somatic mutations on gene expression profiles.  The output can be used to identify mutations in cancer genomes that likely have a functional role in altering normal biological processes leading to the malignant phenotype.  The model is a generative hierarchical Bayes approach which has as input three observed quantities: a patient-gene gene expression matrix, a patient-gene mutation matrix and graph containing known interactions between genes (e.g. from pathway databases). The model outputs the posterior probability of a mutation influencing expression, at a population level the posterior of all the mutations of a gene influence expression, and the posterior probabilities of the genes connected to a mutated gene as being up-regulated, down-regulated or neutral in a patient.  As such, this software can systematically evaluate entire complements of mutations in a tumour and predict the degree to which they alter gene expression.


xseq has been tested on Mac OS version 10.6.8, and Linux CentOS release 5.11

Questions or comments on the software or the model should be directed to Jiarui Ding ( jiaruid at cs dot ubc dot ca) or Sohrab Shah (sshah at bccrc dot ca)


J. Ding, M.K.McConechy, H.M.Horlings, G. Ha, F. Chan, T. Funnell, S. Mullaly, J. Reimand, A. Bashashati, G.D.Bader, D. Huntsman, S. Aparicio, A. Condon, S. P. Shah. Systematic analysis of somatic mutations impacting gene expression in twelve tumour types. Nature Communications Article number:8554 doi:10.1038/ncomms9554

Supplementary Materials:
Supplementary Data 1: The collected 603 bona fide driver genes.
Supplementary Data 2: The 65 xseq predicted genes with cis-effect loss-of-function mutations.
Supplementary Data 3: The 131 candidate tumour suppressor genes and their P(TSG) probabilities.
Supplementary Data 4: The 60 xseq predicted trans-effect genes. These trans-effect genes were annotated as bona fide cancer driver genes.
Supplementary Data 5: The 89 novel xseq predicted trans-effect genes. The 89 novel xseq pre-dicted trans-effect genes which were not significantly mutated6 and were not annotated as bona fide cancer driver genes.
Supplementary Data 6: Enriched pathways and gene ontology terms in the novel trans-effect genes.
Supplementary Data 7: Genes stably up-regulated associated with mutations across tumour types.
Supplementary Data 8: Genes stably down-regulated associated with mutations across tumour types.

Used in:

Chan FC, Telenius A, Healy S, Ben-Neriah S, Mottok A, Lim R, Drake M, Hu S, Ding J, Ha G, Scott DW, Kridel R, Bashashati A, Rogic S, Johnson N, Morin RD,
Rimsza LM, Sehn L, Connors JM, Marra MA, Gascoyne RD, Shah SP, Steidl C.
An RCOR1 loss-associated gene expression signature identifies a prognostically significant DLBCL subgroup. Blood. 2015 Feb 5;125(6):959-66. doi: 10.1182/blood-2013-06-507152. Epub 2014 Nov 13. PubMed PMID: 25395426.

Mathelier A, Lefebvre C, Zhang AW, Arenillas DJ, Ding J, Wasserman WW, Shah SP. Cis-regulatory somatic mutations and gene-expression alteration in B-cell lymphomas. Genome Biol. 2015 Apr 23;16:84. doi: 10.1186/s13059-015-0648-7. PubMed PMID: 25903198


The updated version is on CRAN xseq package

You can also find updates from https://bitbucket.org/MO_BCCRC/xseq

You can download the influence graph as a text file from influence_graph.txt or as an R object from influence_graph.rda

The high-quality human interactome database used for validation purpose (downloaded from http://interactome.dfci.harvard.edu/H_sapiens/index.php?page=download)

Quick demos of xseq

xseq is a R package (written in the statistical programming language R (R-3.1.0 or higher), for efficiency issue, some code has been written in the programming language C), and it should work on Linux, Mac OS and Windows.


Required inputs
1) a mutation matrix
2) a gene expression matrix
3) a gene interaction network,

and optional inputs
1) a copy number call matrix
2) a copy number log2 value matrix

The mutation matrix has mainly 3 columns:
– sample
– hgnc_symbol, and

The expression matrix is a patient by gene matrix where the row names are the patients and column names are the genes. The gene interaction network is a list. Both copy number call and copy number value matrices are patient by gene matrices, the row names are the patients and column names are the genes.

Examples for each of the input files are included in the xseq package with documentations. So after installing the xseq package, users can get help information for each input file.


To demonstrate how to use xseq to analyze the cis- and tran-effects of somatic mutations on gene expression, we put examples to the xseq vignette.

To demonstrate how xseq works on real data, we put a demo script, a network file, and the reformatted TCGA LAML datasets to https://bitbucket.org/MO_BCCRC/xseq