Details zur Publikation |
Kategorie | Textpublikation |
Referenztyp | Zeitschriften |
DOI | 10.1021/acsomega.5c02849 |
Lizenz ![]() |
|
Titel (primär) | Optimizing machine learning-based prediction of terrestrial dissolved organic matter in the ocean using fluorescence and LC-FTMS data |
Autor | Bareth, M.; Koch, B.P.; Zachmann, G.; Kong, X.; Lechtenfeld, O.J.
![]() |
Quelle | ACS Omega |
Erscheinungsjahr | 2025 |
Department | EAC |
Band/Volume | 10 |
Heft | 27 |
Seite von | 29497 |
Seite bis | 29509 |
Sprache | englisch |
Topic | T9 Healthy Planet |
Daten-/Softwarelinks | https://doi.pangaea.de/10.1594/PANGAEA.948019 |
Supplements | https://ndownloader.figstatic.com/files/55952115 |
Abstract | Marine dissolved organic matter (DOM) is an extremely complex mixture of organic compounds that plays a crucial role in the global carbon cycle. In the Arctic, climate change accelerates the release of terrestrial organic carbon. Since chemical information is the only way to track DOM sources and fate, it is essential to improve analytical and data science approaches to assess the DOM composition. Here, we compare random forest (RF), support vector machines, and generalized linear models (GLM) to predict a fluorescence-derived proxy for terrestrial DOM based on molecular formula data from liquid chromatography coupled with Fourier transform mass spectrometry (LC-FTMS). We systematically evaluate different data preprocessing, normalization, and ML techniques to optimize prediction accuracy and computational efficiency. Our results show that a generalized linear model (GLM) with sum normalization provides the most accurate and efficient predictions, achieving a normalized root-mean-square error (NRMSE) of 5.7%─close to the precision of the fluorescence measurement. The prediction based on RF regression was slightly less accurate and required significantly more computation time compared to GLM, but it was most robust against data preprocessing and independent of linear correlations. Feature selection significantly improved the performance of all models, with robust predictions obtained using only ca. 2000 of the ca. 70,000 molecular features per sample. Additionally, we assessed the impact of chromatographic retention time on prediction accuracy and explored the key molecular features contributing to terrestrial DOM signatures using Shapley values and permutation importance (for RFs). Our study is a blueprint for the application of ML to enhance the analysis of high-resolution mass spectrometry data, offering a scalable approach for predicting information important for the understanding of marine DOM chemistry. |
dauerhafte UFZ-Verlinkung | https://www.ufz.de/index.php?en=20939&ufzPublicationIdentifier=31069 |
Bareth, M., Koch, B.P., Zachmann, G., Kong, X., Lechtenfeld, O.J., Maneth, S. (2025): Optimizing machine learning-based prediction of terrestrial dissolved organic matter in the ocean using fluorescence and LC-FTMS data ACS Omega 10 (27), 29497 - 29509 10.1021/acsomega.5c02849 |