Details zur Publikation

Kategorie Textpublikation
Referenztyp Zeitschriften
DOI 10.1186/s13321-025-01000-9
Lizenz creative commons licence
Titel (primär) Prediction of the water solubility by a graph convolutional-based neural network on a highly curated dataset
Autor Ulrich, N.; Voigt, K.; Kudria, A.; Böhme, A.; Ebert, R.-U.
Quelle Journal of Cheminformatics
Erscheinungsjahr 2025
Department EXPO
Band/Volume 17
Seite von art. 55
Sprache englisch
Topic T9 Healthy Planet
Supplements https://static-content.springer.com/esm/art%3A10.1186%2Fs13321-025-01000-9/MediaObjects/13321_2025_1000_MOESM1_ESM.pdf
Keywords Water solubility; Neural networks; Machine learning; Physico-chemical property prediction
Abstract Water solubility is a relevant physico-chemcial property in environmental chemistry, toxicology, and drug design. Although the water solubility is besides the octanol–water partition coefficient, melting point, and boiling point a property with a large amount of available experimental data, there are still more compounds in the chemical universe for which information on their water solubility is lacking. Thus, prediction tools with a broad application domain are needed to fill the corresponding data gaps. To this end, we developed a graph convolutional neural network model (GNN) to predict the water solubility in the form of log Sw based on a highly curated dataset of 9800 chemicals. We started our model development with a curation workflow of the AqSolDB data, ending with 7605 data points. We added 2195 chemicals with experimental data, which we found in the literature, to our dataset. In the final dataset, log Sw values range from − 13.17 to 0.50. Higher values were excluded by a cut-off introduced to eliminate fully miscible chemicals. We developed a consensus GNN by a fivefold split of the corresponding training set (70% of the data) and validation set (20%) and used 10% as independent test set for the evaluation of the performance of the different splits and the consensus model. By doing so, we achieved an r2 of 0.901, a q2 of 0.896, and an rmse of 0.657 on our independently selected test set, which is close to the experimental error of 0.5 to 0.6 log units. We further provide the information on the application domain and compare our performance to other existing prediction tools.
Scientific contribution Based on a highly curated dataset, we developed a neural network to predict the water solubility of chemicals for a broad application domain. Data curation was done by us in a step-wise procedure, where we identified various errors in the experimental data. Based on an independent test set, we compare our prediction results to those of the available prediction models.
dauerhafte UFZ-Verlinkung https://www.ufz.de/index.php?en=20939&ufzPublicationIdentifier=30669
Ulrich, N., Voigt, K., Kudria, A., Böhme, A., Ebert, R.-U. (2025):
Prediction of the water solubility by a graph convolutional-based neural network on a highly curated dataset
J. Cheminformatics 17 , art. 55 10.1186/s13321-025-01000-9