Publication Details

Category Text Publication
Reference Category Journals
DOI 10.1021/acsomega.5c02849
Licence creative commons licence
Title (Primary) Optimizing machine learning-based prediction of terrestrial dissolved organic matter in the ocean using fluorescence and LC-FTMS data
Author Bareth, M.; Koch, B.P.; Zachmann, G.; Kong, X.; Lechtenfeld, O.J. ORCID logo ; Maneth, S.
Source Titel ACS Omega
Year 2025
Department EAC
Volume 10
Issue 27
Page From 29497
Page To 29509
Language englisch
Topic T9 Healthy Planet
Data and Software links https://doi.pangaea.de/10.1594/PANGAEA.948019
Supplements https://ndownloader.figstatic.com/files/55952115
Abstract Marine dissolved organic matter (DOM) is an extremely complex mixture of organic compounds that plays a crucial role in the global carbon cycle. In the Arctic, climate change accelerates the release of terrestrial organic carbon. Since chemical information is the only way to track DOM sources and fate, it is essential to improve analytical and data science approaches to assess the DOM composition. Here, we compare random forest (RF), support vector machines, and generalized linear models (GLM) to predict a fluorescence-derived proxy for terrestrial DOM based on molecular formula data from liquid chromatography coupled with Fourier transform mass spectrometry (LC-FTMS). We systematically evaluate different data preprocessing, normalization, and ML techniques to optimize prediction accuracy and computational efficiency. Our results show that a generalized linear model (GLM) with sum normalization provides the most accurate and efficient predictions, achieving a normalized root-mean-square error (NRMSE) of 5.7%─close to the precision of the fluorescence measurement. The prediction based on RF regression was slightly less accurate and required significantly more computation time compared to GLM, but it was most robust against data preprocessing and independent of linear correlations. Feature selection significantly improved the performance of all models, with robust predictions obtained using only ca.  2000 of the ca.  70,000 molecular features per sample. Additionally, we assessed the impact of chromatographic retention time on prediction accuracy and explored the key molecular features contributing to terrestrial DOM signatures using Shapley values and permutation importance (for RFs). Our study is a blueprint for the application of ML to enhance the analysis of high-resolution mass spectrometry data, offering a scalable approach for predicting information important for the understanding of marine DOM chemistry.
Persistent UFZ Identifier https://www.ufz.de/index.php?en=20939&ufzPublicationIdentifier=31069
Bareth, M., Koch, B.P., Zachmann, G., Kong, X., Lechtenfeld, O.J., Maneth, S. (2025):
Optimizing machine learning-based prediction of terrestrial dissolved organic matter in the ocean using fluorescence and LC-FTMS data
ACS Omega 10 (27), 29497 - 29509 10.1021/acsomega.5c02849