Publication Details

Category Text Publication
Reference Category Journals
DOI 10.1186/s12859-023-05371-4
Licence creative commons licence
Title (Primary) Detecting gene breakpoints in noisy genome sequences using position-annotated colored de-Bruijn graphs
Author Fiedler, L.; Bernt, M. ORCID logo ; Middendorf, M.; Stadler, P.F.
Source Titel BMC Bioinformatics
Year 2023
Department BIOINF
Volume 24
Page From art. 235
Language englisch
Topic T9 Healthy Planet
Supplements https://ndownloader.figstatic.com/files/41069051
Keywords Gene breakpoints; de-Bruijn graph; Genome; Mitochondria
Abstract

Background

Identifying the locations of gene breakpoints between species of different taxonomic groups can provide useful insights into the underlying evolutionary processes. Given the exact locations of their genes, the breakpoints can be computed without much effort. However, often, existing gene annotations are erroneous, or only nucleotide sequences are available. Especially in mitochondrial genomes, high variations in gene orders are usually accompanied by a high degree of sequence inconsistencies. This makes accurately locating breakpoints in mitogenomic nucleotide sequences a challenging task.

Results

This contribution presents a novel method for detecting gene breakpoints in the nucleotide sequences of complete mitochondrial genomes, taking into account possible high substitution rates. The method is implemented in the software package DeBBI. DeBBI allows to analyze transposition- and inversion-based breakpoints independently and uses a parallel program design, allowing to make use of modern multi-processor systems. Extensive tests on synthetic data sets, covering a broad range of sequence dissimilarities and different numbers of introduced breakpoints, demonstrate DeBBI ’s ability to produce accurate results. Case studies using species of various taxonomic groups further show DeBBI ’s applicability to real-life data. While (some) multiple sequence alignment tools can also be used for the task at hand, we demonstrate that especially gene breaks between short, poorly conserved tRNA genes can be detected more frequently with the proposed approach.

Conclusion

The proposed method constructs a position-annotated de-Bruijn graph of the input sequences. Using a heuristic algorithm, this graph is searched for particular structures, called bulges, which may be associated with the breakpoint locations. Despite the large size of these structures, the algorithm only requires a small number of graph traversal steps.

Persistent UFZ Identifier https://www.ufz.de/index.php?en=20939&ufzPublicationIdentifier=27209
Fiedler, L., Bernt, M., Middendorf, M., Stadler, P.F. (2023):
Detecting gene breakpoints in noisy genome sequences using position-annotated colored de-Bruijn graphs
BMC Bioinformatics 24 , art. 235 10.1186/s12859-023-05371-4