Population-based Detection of Structural Variants in Normal and Aberrant Genomes
McGill University, Human Genetics Dept., Canada
Structural Variants(SVs) are key elements in evolution and complex diseases. Their detection from high-throughput sequencing(HTS) data has evolved substantially increasing its sensitivity to smaller events and breakpoints resolution. However the fraction of false positive predictions remains non-negligible and few efforts have been directed toward problematic regions such as low-mappability or repeat-enriched genomic regions, known to be rich in SVs but generally excluded from analyses. After revealing important complex technical bias in HTS, we propose to use a large set of experiments and a population-based approach to robustly identify of abnormal regions genome-wide.
Comparing read coverage across hundreds of samples requires an appropriate normalization step. We show that a general normalization is not sufficient to correct systematic sample-specific variation and develop a flexible and targeted approach. The statistical test for each bin uses a Z-test-like score adjusted by a robust multiple-testing correction. We test our approach on more than 100 normal and tumor whole-genome paired datasets from three different cancer resequencing projects. We show that more concordant germline events and tumor-specific ones are detected, compared to other approaches. Very few regions of the genome were excluded and a number of SVs were detected in low-mappability region. The comprehensiveness and extended genomic coverage of the approach will benefit the characterization of variation in low-mappability regions as well as cancer and disease related studies where complex SVs play an important role.