Publication Details

Category Text Publication
Reference Category Journals
DOI 10.1038/s42003-020-01204-9
Licence creative commons licence
Title (Primary) The archives are half-empty: an assessment of the availability of microbial community sequencing data
Author Jurburg, S.D.; Konzack, M.; Eisenhauer, N.; Heintz-Buschart, A.
Source Titel Communications Biology
Year 2020
Department BOOEK; iDiv
Volume 3
Page From art. 474
Language englisch
Data and Software links https://doi.org/10.5281/zenodo.3953307
https://doi.org/10.5281/zenodo.3953313
Supplements https://static-content.springer.com/esm/art%3A10.1038%2Fs42003-020-01204-9/MediaObjects/42003_2020_1204_MOESM3_ESM.xlsx
Abstract As DNA sequencing has become more popular, the public genetic repositories where sequences are archived have experienced explosive growth. These repositories now hold invaluable collections of sequences, e.g., for microbial ecology, but whether these data are reusable has not been evaluated. We assessed the availability and state of 16S rRNA gene amplicon sequences archived in public genetic repositories (SRA, EBI, and DDJ). We screened 26,927 publications in 17 microbiology journals, identifying 2015 16S rRNA gene sequencing studies. Of these, 7.2% had not made their data public at the time of analysis. Among a subset of 635 studies sequencing the same gene region, 40.3% contained data which was not available or not reusable, and an additional 25.5% contained faults in data formatting or data labeling, creating obstacles for data reuse. Our study reveals gaps in data availability, identifies major contributors to data loss, and offers suggestions for improving data archiving practices.
Persistent UFZ Identifier https://www.ufz.de/index.php?en=20939&ufzPublicationIdentifier=23578
Jurburg, S.D., Konzack, M., Eisenhauer, N., Heintz-Buschart, A. (2020):
The archives are half-empty: an assessment of the availability of microbial community sequencing data
Commun. Biol. 3 , art. 474 10.1038/s42003-020-01204-9