Hybrid Population Genetic-Phylogenetic Codon Models that Allow Tunneling through Deleterious Intermediate States

Understanding the relationship between genotype, phenotype, and function is one of the fundamental problems of genetics. Evolutionary approaches can help to illuminate biology by treating natural populations as experiments and studying their outcomes. Using such approaches, relationships between genotypic variations and biological functions can be inferred, helping to reveal what sequence variants ‘work’ in natural populations and, by substraction, which variants could contribute to disease. We are interested in using comparative genomics to infer quantitative fitness estimates that can help prioritize rare variants in medical genetics studies. Population genetic/phylogenetic models such as mutation-selection codon models are ideal for these purposes, but a significant limitation of existing models is their consideration of only single nucleotide mutations going to fixation at a time. As a result of this limitation, populations under such models are unable to move from one fitness peak to another. We have found that this leads to poor estimates of the relative fitness of different sequence variants. We propose a more realistic ‘stepping stone fixation’ model (a.k.a. stochastic tunnelling) and implement it using our modeling workbench, Palantír. Our model allows certain kinds of fitness valleys to be traversed by explicitly considering a novel approximation to the expected persistence time of deleterious alleles that accounts for several factors including dominance. We show that this model leads to more realistic estimates of fitness and that it solves several technical problems with existing models. We hope this model will improve our ability to leverage comparative and population genomics to understand the biomedical consequences of human genomic variation.