Using Gene Genealogies to Understand the Patterns of Case-control Association

Charith B. Karunarathna and Jinko Graham

Department of Statistics and Actuarial Sicence Simon Fraser University, BC, Canada

Gene genealogies provide information about the relationships among haplotypes sampled from a population, and are potentially useful in identifying disease-predisposing genetic variants. In this simulation study, we examine how genealogical trees relate the haplotypes of individuals with similar traits. Using the sequentially Markov coalescent implemented in fastsimcoal2, we simulate 3000 haplotypes of 4000 SNPs in a 2-Mbp genomic region having a recombination rate of 2 × 10−8 per bp per generation. Using an additive logistic model, we assign a dichotomous disease trait to individuals based on ‘risk’ SNPs that are randomly selected from the region to ensure a population disease prevalence of 10%. Once affected and unaffected individuals are assigned, we sample 50 ‘cases’ and ‘controls’ for a case-control data analysis. We present basic descriptive summaries of the resulting case-control data, such as SNP minor allele frequencies, heatmaps of pairwise linkage disequilibria between SNPs, and Manhattan plots of disease association. We then examine the gene genealogies underlying the case-control data, and how they explain the patterns observed in the case-control data.