Population-based Detection of Structural Variants in Normal and Aberrant Genomes

Jean Monlong and Guillaume Bourque

McGill University, Human Genetics Dept., Canada

Structural Variants(SVs) are key elements in evolution and complex diseases. Their detection from high-throughput sequencing(HTS) data has evolved substantially increasing its sensitivity to smaller events and breakpoints resolution. However the fraction of false positive predictions remains non-negligible and few efforts have been directed toward problematic regions such as low-mappability or repeat-enriched genomic regions, known to be rich in SVs but generally excluded from analyses. After revealing important complex technical bias in HTS, we propose to use a large set of experiments and a population-based approach to robustly identify of abnormal regions genome-wide.

Comparing read coverage across hundreds of samples requires an appropriate normalization step. We show that a general normalization is not sufficient to correct systematic sample-specific variation and develop a flexible and targeted approach. The statistical test for each bin uses a Z-test-like score adjusted by a robust multiple-testing correction. We test our approach on more than 100 normal and tumor whole-genome paired datasets from three different cancer resequencing projects. We show that more concordant germline events and tumor-specific ones are detected, compared to other approaches. Very few regions of the genome were excluded and a number of SVs were detected in low-mappability region. The comprehensiveness and extended genomic coverage of the approach will benefit the characterization of variation in low-mappability regions as well as cancer and disease related studies where complex SVs play an important role.