Publication Details

Category Text Publication
Reference Category Journals
DOI 10.3389/fgene.2023.1250907
Licence creative commons licence
Title (Primary) Fully automated annotation of mitochondrial genomes using a cluster-based approach with de Bruijn graphs
Author Fiedler, L.; Middendorf, M.; Bernt, M. ORCID logo
Source Titel Frontiers in Genetics
Year 2023
Department BIOINF
Volume 14
Page From art. 1250907
Language englisch
Topic T9 Healthy Planet
Data and Software links https://doi.org/10.5281/zenodo.8101631
Supplements https://ndownloader.figstatic.com/files/41952981
Keywords annotation; gene prediction; mitochondria; genome; mitogenome; Metazoa; de Bruijn graph; clustering
Abstract A wide range of scientific fields, such as forensics, anthropology, medicine, and molecular evolution, benefits from the analysis of mitogenomic data. With the development of new sequencing technologies, the amount of mitochondrial sequence data to be analyzed has increased exponentially over the last few years. The accurate annotation of mitochondrial DNA is a prerequisite for any mitogenomic comparative analysis. To sustain with the growth of the available mitochondrial sequence data, highly efficient automatic computational methods are, hence, needed. Automatic annotation methods are typically based on databases that contain information about already annotated (and often pre-curated) mitogenomes of different species. However, the existing approaches have several shortcomings: 1) they do not scale well with the size of the database; 2) they do not allow for a fast (and easy) update of the database; and 3) they can only be applied to a relatively small taxonomic subset of all species. Here, we present a novel approach that does not have any of these aforementioned shortcomings, (1), (2), and (3). The reference database of mitogenomes is represented as a richly annotated de Bruijn graph. To generate gene predictions for a new user-supplied mitogenome, the method utilizes a clustering routine that uses the mapping information of the provided sequence to this graph. The method is implemented in a software package called DeGeCI (De Bruijn graph Gene Cluster Identification). For a large set of mitogenomes, for which expert-curated annotations are available, DeGeCI generates gene predictions of high conformity. In a comparative evaluation with MITOS2, a state-of-the-art annotation tool for mitochondrial genomes, DeGeCI shows better database scalability while still matching MITOS2 in terms of result quality and providing a fully automated means to update the underlying database. Moreover, unlike MITOS2, DeGeCI can be run in parallel on several processors to make use of modern multi-processor systems.
Persistent UFZ Identifier https://www.ufz.de/index.php?en=20939&ufzPublicationIdentifier=27798
Fiedler, L., Middendorf, M., Bernt, M. (2023):
Fully automated annotation of mitochondrial genomes using a cluster-based approach with de Bruijn graphs
Front. Genet. 14 , art. 1250907 10.3389/fgene.2023.1250907