Publication Details

Category Text Publication
Reference Category Journals
DOI 10.1039/d2va00225f
Licence creative commons licence
Title (Primary) Getting the SMILES right: identifying inconsistent chemical identities in the ECHA database, PubChem and the CompTox Chemicals Dashboard
Author Glüge, J.; McNeill, K.; Scheringer, M.
Source Titel Environmental Science-Advances
Year 2023
Department ZELLTOX
Volume 2
Issue 4
Page From 612
Page To 621
Language englisch
Topic T9 Healthy Planet
Supplements https://www.rsc.org/suppdata/d2/va/d2va00225f/d2va00225f1.xlsx
https://www.rsc.org/suppdata/d2/va/d2va00225f/d2va00225f2.pdf
Abstract Chemical databases containing information on substances and their identities are important and useful tools, used in many areas of chemistry and cheminformatics. Errors or inconsistencies in the identities of substances in the databases are a major problem, as they can make QSAR predictions inaccurate, make chemical hazard and risk assessments erroneous, and cause problems for the ordering of chemicals and analytical standards. In the present study, we checked the entries of all mono-constituent organic substances registered under REACH (more than 8500 substances) in the database of the European Chemicals Agency (ECHA), PubChem and the CompTox Chemicals Dashboard and flagged compounds with inconsistent chemical identifiers. In total 736 inconsistent entries, and 48 additional entries where the substance identity was not clear, were identified. This shows that data curation activities are still not sufficient in the databases and that more work needs to be done. Additionally, the identified inconsistent entries were analyzed to understand what kind of mismatches have been introduced in the databases and to avoid these mismatches in the future. Data gathering and processing is described in detail in the current study so that further studies can continue with this work for additional substances and databases. In this way, the study makes an important contribution towards improved and more trustworthy databases.
Persistent UFZ Identifier https://www.ufz.de/index.php?en=20939&ufzPublicationIdentifier=23316
Glüge, J., McNeill, K., Scheringer, M. (2023):
Getting the SMILES right: identifying inconsistent chemical identities in the ECHA database, PubChem and the CompTox Chemicals Dashboard
Environmental Science-Advances 2 (4), 612 - 621 10.1039/d2va00225f