Application of Rainfall Plots to Analyze the Genomic Landscape of SNP Differences between Mouse Tissues Detected Using the Mouse Diversity Genotyping Array

Alanna K. Edge1, Bin Luo2, Charmaine Dean2, Reg Kulperger2, Conny Toelg3, Eva A. Turley3, Kathleen A Hill1

1. Department of Biology, Western University; 2. Department of Statistical and Actuarial Sciences, Western University; 3. London Regional Cancer Program, London Health Sciences Centre, London, ON, Canada N6A 5B7

In contrast to the typical observations of spontaneous mutations as rare, independent events with random spacing, our current genome-wide perspective to mutation analysis has revealed a new spontaneous mutation signature. This K-signature or “Kataegis” describes ``thundershowers`` or clusters of mutations occurring across the genome. We first identified clustered mutations [Hill et al. 2004 Mutat Res 554:223-40] and provided evidence for mutation showers [Wang et al. 2007 PNAS 104:8403-8] and transient hypermutability using a transgene in the mouse. Recent studies using whole genome sequencing data plotted inter-mutation spacing along the genome landscape and visually captured clusters of proximal mutations in a rainfall plot. Given that mutation showers are rare events in healthy tissues and associated with only certain cancers, whole genome sequencing is not a cost effective screening tool. Instead, we propose using a high-density, single nucleotide polymorphism (SNP) array to detect mutations with a genome-wide perspective. The Mouse Diversity Genotyping Array permits the analysis of mutations at 493,290 SNP loci at a fraction of the cost and with minimal bioinformatics effort compared to next generation sequencing of the mouse genome. We applied our unique approach to the analysis of single nucleotide differences between different healthy tissues within 14 C57BL/6J mice and between multiple primary and metastatic tissue samples. Rainfall plots were used to portray inter-mutational spacing for each mouse tissue sample. Monte Carlo simulations with random selection of array probes were used to generate expected profiles with a random distribution of mutations and baseline pattern of base substitutions detectable at the MDGA SNP loci. Primary and metastatic tissues have an elevated mutation load compared to healthy tissues, and the types of base substitutions are more heterogeneous. There is a bias for C to G transversions at CpG sites in the healthy tissues examined from the C57BL/6J mice. Comparisons of observed spontaneous mutation landscapes to those expected for rare, random and independent mutational events is in progress. The combination of the MDGA and rainfall plots offers an efficient screening tool for detecting mutation clusters relevant to understanding the nature and role of transient hypermutability in normal tissue development and carcinogenesis.