Publication Details

Category Text Publication
Reference Category Journals
DOI 10.1093/bioinformatics/btr314
Title (Primary) Computational discovery of human coding and non-coding transcripts with conserved splice sites
Author Rose, D.; Hiller, M.; Schutt, K.; Hackermüller, J. ORCID logo ; Backofen, R.; Stadler, P.F.
Source Titel Bioinformatics
Year 2011
Department PROTEOM
Volume 27
Issue 14
Page From 1894
Page To 1900
Language englisch
Abstract

Motivation: Long non-coding RNAs (lncRNAs) resemble protein-coding mRNAs but do not encode proteins. Most lncRNAs are under lower sequence constraints than protein-coding genes and lack conserved secondary structures, making it hard to predict them computationally.
Results: We introduce an approach to predict spliced lncRNAs in vertebrate genomes combining comparative genomics and machine learning. It is based on detecting signatures of characteristic splice site evolution in vertebrate whole genome alignments. First, we predict individual splice sites, then assemble compatible sites into exon candidates, and finally predict multi-exon transcripts. Using a novel method to evaluate typical splice site substitution patterns that explicitly takes the species phylogeny into account, we show that individual splice sites can be accurately predicted. Since our approach relies only on predicted splice sites, it can uncover both coding and non-coding exons. We show that our predicted exons and partial transcripts are mostly non-coding and lack conserved secondary structures. These exons are of particular interest, since existing computational approaches cannot detect them. Transcriptome sequencing data indicate tissue-specific expression patterns of predicted exons and there is evidence that increasing sequencing depth and breadth will validate additional predictions. We also found a significant enrichment of predicted exons that form multi-exon transcript parts, and we experimentally validate such a novel multi-exon gene. Overall, we obtain 336 novel multi-exon transcript predictions from human intergenic regions. Our results indicate the existence of novel human transcripts that are conserved in evolution and our approach contributes to the completion of the human transcript catalog.

Persistent UFZ Identifier https://www.ufz.de/index.php?en=20939&ufzPublicationIdentifier=11359
Rose, D., Hiller, M., Schutt, K., Hackermüller, J., Backofen, R., Stadler, P.F. (2011):
Computational discovery of human coding and non-coding transcripts with conserved splice sites
Bioinformatics 27 (14), 1894 - 1900 10.1093/bioinformatics/btr314