Expression of Transposable Elements as Noncoding Transcripts

LeeAnn Ramsay, Guillaume Bourque

McGill University, Department of Human Genetics

Recent research has revealed that a large proportion of long non-coding RNAs (lncRNAs) are derived from transposable elements (TEs). Several of these repeat-derived lncRNAs have experimentally validated functions. For example the human endogenous retrovirus subfamily H (HERV-H) is specifically expressed in human embryonic stem cells, and it is required for maintaining stem cell pluripotency.

Using bioinformatic techniques we aim to identify repeat-derived lncRNAs whose sequence and expression data are conserved in primate species. Since sequence and expression conservation are good indicators of functionality our goal is to identify TE-derived lncRNAs which show a potential for function based on their conservation. The analysis is performed on RNA-seq data from induced pluripotent stem cells of 4 primates species (human, chimpanzee, gorilla, and rhesus). In this analysis we first identify conserved TEs using pairwise alignments between species. This revealed that over 80% of transposable elements in non-human primates have homologous sequences in human. Next we examined the expression pattern of transposable elements before overlaying this data with lncRNA annotations. The expression of these regions are compared between the primate species. We found there is conservation in the expression pattern of repeat classes in primate species. In addition, over 80% of repeats expressed in human have orthologous regions in chimpanzee, and about 70% in gorilla and rhesus. In identifying conserved repeat-derived lncRNAs we hope to narrow the list of interesting non-coding transcripts, which could be functionally validated in a lab setting.