Spatial Statistical Tools for Genome-Wide Mutation Thundershower Detection under a Microarray Probe Sampling System

Bin Luo1, Alanna K. Edge2, Charmaine Dean1, Kathleen A Hill2, Reg Kulperger1

1. Department of Statistical and Actuarial Sciences, Western University; 2. Department of Biology, Western University

A new signature of clustered mutations across the genome is the K-signature, “Kataegis” or thundershower. The K-signature is challenging to detect given its occurrence at locations across a genome in only some cancers of certain cancer types. In contrast to whole genome sequencing, organism-specific single nucleotide polymorphism (SNP) genotyping arrays offer a cost-effective approach to high-resolution mutation detection at hundreds of thousands of probe sites for a large number of samples. For example, the Mouse Diversity Genotyping Array (MDGA) is designed to detect mutations at single nucleotide loci at about 500,000 locations across the mouse genome for any tissue or cell sample.

Particular statistical tools are required to test for randomness of SNP site differences across the genome. Based on the characteristics of the array probe design, several test statistics are developed characterizing the spatial properties of dispersion of SNP site mutations. Monte Carlo simulations are performed to obtain the null distributions of the test statistics. The Neyman-Scott process, a parent-child clustering mechanism, is proposed as the alternative hypothesis for a power study where the performance of several established test statistics are evaluated by Monte Carlo simulation. Recommendations are made concerning test statistics which show good performance under various parameter settings. The test statistics are applied to real biological samples for illustration. These spatial statistics tools are applicable to other genotyping arrays including the popular Genome-wide Human Genotyping Array 6.0.