Statistical Evaluation of Normalization Methods for NanoString nCounter Data

Elika Garg¹, Ivan Topisirovic², and Robert Nadon^1,3

1. Department of Human Genetics, McGill University, Montreal, Canada; 2. Lady Davis Institute for Medical Research, Sir Mortimer B. Davis-Jewish General Hospital and Departments of Oncology, Experimental Medicine and Biochemistry, McGill University, Montreal, Canada; 3. McGill University and Genome Quebec Innovation Centre, Montreal, Canada

NanoString is a novel technology which is becoming widely-accepted in the biomedical community for measurement of gene expression. The count data generated by its nCounter machine is sensitive to pre-processing methods. A statistical evaluation of the pre-processing techniques is discussed and applied on four different datasets, one of which was generated in-house for a prostate cancer study. As required by statistical tests, variance within each dataset is stabilized before evaluation. The evaluation strategy consists of multiple qualitative and quantitative assessments to check the validity of normalized data through a two-step examination. In the first step replicates belonging to the same biological condition (e.g. within a control group) are compared, assuming high similarity under the null hypothesis. Methods that conform to the expectation proceed to the second step where samples belonging to biologically opposite conditions (e.g. control vs knockdown groups) are compared, assuming significant number of differences. The method that performs the best is deemed the most appropriate normalization method for that particular dataset. Results indicate that normalization methods using negative controls may be inappropriate for some data, as they performed poorly with 3 out of 4 datasets. Additionally, the importance of variance stabilization, an often skipped pre-processing step, is established. This evaluation strategy holds the potential of extensive applicability transcending technologies and improving research quality.