Publication Details

Category Text Publication
Reference Category Journals
DOI 10.1093/gigascience/giag040
Licence creative commons licence
Title (Primary) HVRLocator: A computationally efficient tool for identifying hypervariable regions in 16S rRNA big datasets
Author Arboleda Baena, C.M.; Borim Correa, F.; Saraiva, J.P.; Castillo-Rivadeneira, S.; Kasmanas, J.C.; Chatzinotas, A.; Jurburg, S.D.
Source Titel GigaScience
Year 2026
Department iDiv; AME
Language englisch
Topic T7 Bioeconomy
Keywords Big data; 16S rRNA gene; metabarcoding; high throughput sequencing; 50 metadata; microbial ecology
Abstract Background
Metabarcoding of the 16S rRNA gene is widely used to assess microbial diversity due to its cost-effectiveness and efficiency. However, publicly available 16S rRNA metabarcoding datasets often lack standardized metadata, particularly information on the sequenced hypervariable regions or primers used, which are critical to their accurate reuse. To address this, we present HVRLocator, a computational tool that (1) identifies the start and end positions of 16S rRNA amplicons, (2) determines their corresponding hypervariable regions, and (3) detects the presence of primer sequences. This tool was validated on four datasets comprising 41,513 samples generated with different primers and sequencing platforms.
Results
HVRLocator can process archived 16S rRNA sequences from NCBI SRA at an average rate of 6.5 samples per minute. Validation showed it reliably detects amplicon start and end positions across datasets sequenced with different primers and platforms, achieving 100% accuracy within single-platform studies and correctly revealing length heterogeneity across platforms. It also flagged misannotated metadata and problematic sequences, underscoring its value as a sequence data curation tool. Finally, HVRLocator can select comparable sequences to build large 16S rRNA amplicon databases spanning the same hypervariable region, facilitating cross-study comparisons.
Conclusion
HVRLocator overcomes unreliable metadata by accurately identifying 16S rRNA amplicon start and end positions, determining hypervariable regions, and detecting primer sequences, enabling accurate curation and large-scale processing of 16S rRNA data for reliable and reproducible microbial studies, syntheses, and meta-analyses.
Arboleda Baena, C.M., Borim Correa, F., Saraiva, J.P., Castillo-Rivadeneira, S., Kasmanas, J.C., Chatzinotas, A., Jurburg, S.D. (2026):
HVRLocator: A computationally efficient tool for identifying hypervariable regions in 16S rRNA big datasets
GigaScience
10.1093/gigascience/giag040