Identifying Driver Genes With Models of Mutational Heterogeneity

Aaron Mathankeri and Jason de Koning

A core aim of cancer research is to identify and catalogue the genes that cause cancer when mutated. These genes, termed “drivers”, are special because they confer a selective advantage by increasing the net replication rate of their car- riers resulting in uncontrolled growth. As deeper sequencing is performed in individual cancers, there is an opportunity to more sensitively delineate driver mutations from random mutations in genes that do not initiate or promote cancer. To address this problem, we propose to identify drivers by first estab- lishing an accurate background mutational spectrum across the human genome. Although it is widely accepted that mutation across the genome is heterogenous, current methods of detecting drivers have not been able to fully capture this heterogeneity because of the difficulty in properly modelling it along with its causes. We use a Hidden Markov Model with covariates ap- proach to characterize the mutational spectrum across the genome and pool information from evolutionary data to infer detailed mutation models. This novel evolutionary approach will allow more sensitive identification of regions and patterns that are enriched for mutations in tumour compared to normal genome sequences. This information will be vital for leveraging cancer genome data to better understand the biology of cancer.