Details zur Publikation

Kategorie Textpublikation
Referenztyp Zeitschriften
DOI 10.1093/bioinformatics/btr314
Titel (primär) Computational discovery of human coding and non-coding transcripts with conserved splice sites
Autor Rose, D.; Hiller, M.; Schutt, K.; Hackermüller, J. ORCID logo ; Backofen, R.; Stadler, P.F.
Quelle Bioinformatics
Erscheinungsjahr 2011
Department PROTEOM
Band/Volume 27
Heft 14
Seite von 1894
Seite bis 1900
Sprache englisch

Motivation: Long non-coding RNAs (lncRNAs) resemble protein-coding mRNAs but do not encode proteins. Most lncRNAs are under lower sequence constraints than protein-coding genes and lack conserved secondary structures, making it hard to predict them computationally.
Results: We introduce an approach to predict spliced lncRNAs in vertebrate genomes combining comparative genomics and machine learning. It is based on detecting signatures of characteristic splice site evolution in vertebrate whole genome alignments. First, we predict individual splice sites, then assemble compatible sites into exon candidates, and finally predict multi-exon transcripts. Using a novel method to evaluate typical splice site substitution patterns that explicitly takes the species phylogeny into account, we show that individual splice sites can be accurately predicted. Since our approach relies only on predicted splice sites, it can uncover both coding and non-coding exons. We show that our predicted exons and partial transcripts are mostly non-coding and lack conserved secondary structures. These exons are of particular interest, since existing computational approaches cannot detect them. Transcriptome sequencing data indicate tissue-specific expression patterns of predicted exons and there is evidence that increasing sequencing depth and breadth will validate additional predictions. We also found a significant enrichment of predicted exons that form multi-exon transcript parts, and we experimentally validate such a novel multi-exon gene. Overall, we obtain 336 novel multi-exon transcript predictions from human intergenic regions. Our results indicate the existence of novel human transcripts that are conserved in evolution and our approach contributes to the completion of the human transcript catalog.

dauerhafte UFZ-Verlinkung
Rose, D., Hiller, M., Schutt, K., Hackermüller, J., Backofen, R., Stadler, P.F. (2011):
Computational discovery of human coding and non-coding transcripts with conserved splice sites
Bioinformatics 27 (14), 1894 - 1900 10.1093/bioinformatics/btr314