Publication Details |
Category | Text Publication |
Reference Category | Journals |
DOI | 10.1186/s13321-025-00950-4 |
Licence ![]() |
|
Title (Primary) | MLinvitroTox reloaded for high-throughput hazard-based prioritization of high-resolution mass spectrometry data |
Author | Arturi, K.; Harris, E.J.; Gasser, L.; Escher, B.I.
![]() |
Source Titel | Journal of Cheminformatics |
Year | 2025 |
Department | ZELLTOX |
Volume | 17 |
Page From | art. 14 |
Language | englisch |
Topic | T9 Healthy Planet |
Data and Software links | https://doi.org/10.25678/00041J https://doi.org/10.5281/zenodo.13323297 |
Supplements | https://static-content.springer.com/esm/art%3A10.1186%2Fs13321-025-00950-4/MediaObjects/13321_2025_950_MOESM1_ESM.pdf |
Abstract | MLinvitroTox
is an automated Python pipeline developed for high-throughput
hazard-driven prioritization of toxicologically relevant signals
detected in complex environmental samples through high-resolution
tandem mass spectrometry (HRMS/MS). MLinvitroTox
is a machine learning (ML) framework comprising 490 independent XGBoost
classifiers trained on molecular fingerprints from chemical structures
and target-specific endpoints from the ToxCast/Tox21 invitroDBv4.1
database. For each analyzed HRMS feature, MLinvitroTox
generates a 490-bit bioactivity fingerprint used as a basis for
prioritization, focusing the time-consuming molecular identification
efforts on features most likely to cause adverse effects. The practical
advantages of MLinvitroTox are
demonstrated for groundwater HRMS data. Among the 874 features for which
molecular fingerprints were derived from spectra, including 630
nontargets, 185 spectral matches, and 59 targets, around 4% of the
feature/endpoint relationship pairs were predicted to be active.
Cross-checking the predictions for targets and spectral matches with
invitroDB data confirmed the bioactivity of 120 active and 6791
nonactive pairs while mislabeling 88 active and 56 non-active
relationships. By filtering according to bioactivity probability,
endpoint scores, and similarity to the training data, the number of
potentially toxic features was reduced by at least one order of
magnitude. This refinement makes the analytical confirmation of the
toxicologically most relevant features feasible, offering significant
benefits for cost-efficient chemical risk assessment. Scientific Contribution: In contrast to the classical ML-based approaches for toxicity prediction, MLinvitroTox predicts bioactivity for HRMS features (i.e., distinct m/z signals) based on MS2 fragmentation spectra rather than the chemical structures from the identified features. While the original proof of concept study was accompanied by the release of a MLinvitroTox v1 KNIME workflow, in this study, we release a Python MLinvitroTox v2 package, which, in addition to automation, expands functionality to include predicting toxicity from structures, cleaning up and generating chemical fingerprints, customizing models, and retraining on custom data. Furthermore, as a result of improvements in bioactivity data processing, realized in the concurrently released pytcpl Python package for the custom processing of invitroDBv4.1 input data used for training MLinvitroTox, the current release introduces enhancements in model accuracy, coverage of biological mechanistic targets, and overall interpretability. |
Persistent UFZ Identifier | https://www.ufz.de/index.php?en=20939&ufzPublicationIdentifier=30489 |
Arturi, K., Harris, E.J., Gasser, L., Escher, B.I., Braun, G., Bosshard, R., Hollender, J. (2025): MLinvitroTox reloaded for high-throughput hazard-based prioritization of high-resolution mass spectrometry data J. Cheminformatics 17 , art. 14 10.1186/s13321-025-00950-4 |