Details zur Publikation |
Kategorie | Textpublikation |
Referenztyp | Zeitschriften |
DOI | 10.1186/s13321-025-01000-9 |
Lizenz ![]() |
|
Titel (primär) | Prediction of the water solubility by a graph convolutional-based neural network on a highly curated dataset |
Autor | Ulrich, N.; Voigt, K.; Kudria, A.; Böhme, A.; Ebert, R.-U. |
Quelle | Journal of Cheminformatics |
Erscheinungsjahr | 2025 |
Department | EXPO |
Band/Volume | 17 |
Seite von | art. 55 |
Sprache | englisch |
Topic | T9 Healthy Planet |
Supplements | https://static-content.springer.com/esm/art%3A10.1186%2Fs13321-025-01000-9/MediaObjects/13321_2025_1000_MOESM1_ESM.pdf |
Keywords | Water solubility; Neural networks; Machine learning; Physico-chemical property prediction |
Abstract | Water
solubility is a relevant physico-chemcial property in environmental
chemistry, toxicology, and drug design. Although the water solubility is
besides the octanol–water partition coefficient, melting point, and
boiling point a property with a large amount of available experimental
data, there are still more compounds in the chemical universe for which
information on their water solubility is lacking. Thus, prediction tools
with a broad application domain are needed to fill the corresponding
data gaps. To this end, we developed a graph convolutional neural
network model (GNN) to predict the water solubility in the form of log Sw
based on a highly curated dataset of 9800 chemicals. We started our
model development with a curation workflow of the AqSolDB data, ending
with 7605 data points. We added 2195 chemicals with experimental data,
which we found in the literature, to our dataset. In the final dataset,
log Sw values range from − 13.17 to 0.50. Higher
values were excluded by a cut-off introduced to eliminate fully miscible
chemicals. We developed a consensus GNN by a fivefold split of the
corresponding training set (70% of the data) and validation set (20%)
and used 10% as independent test set for the evaluation of the
performance of the different splits and the consensus model. By doing
so, we achieved an r2 of 0.901, a q2 of 0.896, and an rmse
of 0.657 on our independently selected test set, which is close to the
experimental error of 0.5 to 0.6 log units. We further provide the
information on the application domain and compare our performance to
other existing prediction tools. Scientific contribution Based on a highly curated dataset, we developed a neural network to predict the water solubility of chemicals for a broad application domain. Data curation was done by us in a step-wise procedure, where we identified various errors in the experimental data. Based on an independent test set, we compare our prediction results to those of the available prediction models. |
dauerhafte UFZ-Verlinkung | https://www.ufz.de/index.php?en=20939&ufzPublicationIdentifier=30669 |
Ulrich, N., Voigt, K., Kudria, A., Böhme, A., Ebert, R.-U. (2025): Prediction of the water solubility by a graph convolutional-based neural network on a highly curated dataset J. Cheminformatics 17 , art. 55 10.1186/s13321-025-01000-9 |