GENLIB: An R Package for the Analysis of Genealogical Data

Marie-Hélène Roy-Gagnon1,2, Héloïse Gauvin2,3, Jean-François Lefebvre2, Claudia Moreau2, Ève-Marie Lavoie4, Damian Labuda2,5, Hélène Vézina4

1. School of Epidemiology, Public Health and Preventive Medicine, University of Ottawa, Ottawa, Ontario, Canada; 2. CHU Sainte-Justine Research Center, Montreal, Quebec, Canada; 3. Department of Social and Preventive Medicine, Université de Montréal, Montreal, Quebec, Canada; 4. BALSAC Project, Université du Québec à Chicoutimi, Chicoutimi, Quebec, Canada; 5. Department of Pediatrics, Université de Montréal, Montreal, Quebec, Canada

Founder populations play an important role in the study of genetic diseases. Their advantages often include access to detailed genealogical records. These genealogical data provide unique information for researchers in evolutionary and population genetics, demography and genetic epidemiology. However, analyzing large genealogical datasets require specialized methods and software. The GENLIB software was originally developed to study the large genealogies of the French Canadian population of Quebec, Canada. These genealogies are accessible through the BALSAC database, which contains over 3 million records covering the whole province of Quebec over four centuries. Using this resource, extended pedigrees of up to 17 generations can be constructed from a sample of present-day individuals.

We have implemented GENLIB as a package in the R environment for statistical computing and graphics, thus allowing optimal flexibility for users. GENLIB includes functions to manage genealogical data, for example extracting part of a genealogy or selecting specific individuals. Functions to describe genealogies using relevant summary measures, such as genealogical completeness and generational depth, are also available. In addition, GENLIB can compute measures of relatedness (kinship and inbreeding) and the genetic contribution of founders. Finally, functions for gene-dropping simulations are also available in GENLIB.

We illustrate the use of GENLIB with a sample of 140 individuals from regional populations of Quebec previously described in Roy-Gagnon et al., 2011. Ascending genealogies were reconstructed for these individuals using BALSAC, yielding a large pedigree of 41,523 individuals. With GENLIB, we provide a more detailed description of these genealogical data in terms of completeness, genetic contribution of founders, relatedness and inbreeding, further illustrating the regional differences reported in these data. We also present gene-dropping simulations based on the whole genealogy to estimate the probability of sharing, identical by descent, chromosomal segments of varying lengths.

In conclusion, the R package GENLIB provides a user friendly and flexible environment to analyze genealogical data, allowing a more efficient and easier integration of different types of data and analytical methods and making it ideal for further developments.