Details zur Publikation

Kategorie Textpublikation
Referenztyp Zeitschriften
DOI 10.1021/acsomega.5c02849
Lizenz creative commons licence
Titel (primär) Optimizing machine learning-based prediction of terrestrial dissolved organic matter in the ocean using fluorescence and LC-FTMS data
Autor Bareth, M.; Koch, B.P.; Zachmann, G.; Kong, X.; Lechtenfeld, O.J. ORCID logo ; Maneth, S.
Quelle ACS Omega
Erscheinungsjahr 2025
Department EAC
Band/Volume 10
Heft 27
Seite von 29497
Seite bis 29509
Sprache englisch
Topic T9 Healthy Planet
Daten-/Softwarelinks https://doi.pangaea.de/10.1594/PANGAEA.948019
Supplements https://ndownloader.figstatic.com/files/55952115
Abstract Marine dissolved organic matter (DOM) is an extremely complex mixture of organic compounds that plays a crucial role in the global carbon cycle. In the Arctic, climate change accelerates the release of terrestrial organic carbon. Since chemical information is the only way to track DOM sources and fate, it is essential to improve analytical and data science approaches to assess the DOM composition. Here, we compare random forest (RF), support vector machines, and generalized linear models (GLM) to predict a fluorescence-derived proxy for terrestrial DOM based on molecular formula data from liquid chromatography coupled with Fourier transform mass spectrometry (LC-FTMS). We systematically evaluate different data preprocessing, normalization, and ML techniques to optimize prediction accuracy and computational efficiency. Our results show that a generalized linear model (GLM) with sum normalization provides the most accurate and efficient predictions, achieving a normalized root-mean-square error (NRMSE) of 5.7%─close to the precision of the fluorescence measurement. The prediction based on RF regression was slightly less accurate and required significantly more computation time compared to GLM, but it was most robust against data preprocessing and independent of linear correlations. Feature selection significantly improved the performance of all models, with robust predictions obtained using only ca.  2000 of the ca.  70,000 molecular features per sample. Additionally, we assessed the impact of chromatographic retention time on prediction accuracy and explored the key molecular features contributing to terrestrial DOM signatures using Shapley values and permutation importance (for RFs). Our study is a blueprint for the application of ML to enhance the analysis of high-resolution mass spectrometry data, offering a scalable approach for predicting information important for the understanding of marine DOM chemistry.
dauerhafte UFZ-Verlinkung https://www.ufz.de/index.php?en=20939&ufzPublicationIdentifier=31069
Bareth, M., Koch, B.P., Zachmann, G., Kong, X., Lechtenfeld, O.J., Maneth, S. (2025):
Optimizing machine learning-based prediction of terrestrial dissolved organic matter in the ocean using fluorescence and LC-FTMS data
ACS Omega 10 (27), 29497 - 29509 10.1021/acsomega.5c02849