Publication Details

Category Text Publication
Reference Category Journals
DOI 10.5194/hess-29-5005-2025
Licence creative commons licence
Title (Primary) How well do process-based and data-driven hydrological models learn from limited discharge data?
Author Staudinger, M.; Herzog, A.; Loritz, R.; Houska, T.; Pool, S.; Spieler, D.; Wagner, P.D.; Mai, J.; Kiesel, J.; Thober, S. ORCID logo ; Guse, B.; Ehret, U.
Source Titel Hydrology and Earth System Sciences
Year 2025
Department CHS
Volume 29
Issue 19
Page From 5005
Page To 5029
Language englisch
Topic T5 Future Landscapes
Data and Software links https://doi.org/10.2909/960998c1-1870-4e82-8051-6485205ebbac
https://doi.org/10.5281/zenodo.14938050
Supplements Supplement 1
Abstract It is widely assumed that data-driven models achieve good results only with sufficiently large training data, whereas process-based models are usually expected to be superior in data-poor situations. To investigate this, we calibrated several process-based and data-driven hydrological models using training datasets of observed discharge that differed in terms of both the number of data points and the type of data selection, allowing us to make a systematic comparison of the learning behaviour of the different model types. Four data-driven models (conditional probability distributions, regression trees, artificial neural networks, and long short-term memory networks) and three process-based models (GR4J, HBV, and SWAT+) were included in the testing, applied in three meso-scale catchments representing different landscapes in Germany: the Iller in the Alpine region, the Saale in the low mountain ranges, and the Selke in the transition between the Harz and central German lowlands. We used information measures (joint entropy and conditional entropy) for system analysis and model performance evaluation because they offer several desirable properties: they extend seamlessly from uni- to multivariate data, they allow direct comparison of predictive uncertainty with and without model simulations, and their boundedness helps to put results into perspective. In addition to the main question of this study – to what extent does the performance of different models depend on the training dataset? – we investigated whether the selection of training data (random, according to information content, contiguous time periods, or independent time points) plays a role. We also examined whether the shape of the learning curve for different models can be used to predict the achievable model performance based on the information contained in the data and whether using more spatially distributed model inputs improves model performance compared to using spatially lumped inputs. Process-based models outperformed data-driven ones for small amounts of training data due to their predefined structure. However, as the amount of training data increases, the learning curve of process-based models quickly saturates, and data-driven models become more effective. In particular, the long short-term memory network outperforms all process-based models when trained with more than 2–5 years of data and continues to learn from additional training data without approaching saturation. Surprisingly, fully random sampling of training data points for the HBV model led to better learning results than consecutive random sampling or optimal sampling in terms of information content. Analysing multivariate catchment data allows predictions about how these data can be used to predict discharge. When no memory was considered, the conditional entropy was high. However, as soon as memory was introduced in the form of the previous day or week, the conditional entropy decreased, suggesting that memory is an important component of the data and that capturing it improves model performance. This was particularly evident in the catchments in the low mountain ranges and the Alpine region.
Persistent UFZ Identifier https://www.ufz.de/index.php?en=20939&ufzPublicationIdentifier=31528
Staudinger, M., Herzog, A., Loritz, R., Houska, T., Pool, S., Spieler, D., Wagner, P.D., Mai, J., Kiesel, J., Thober, S., Guse, B., Ehret, U. (2025):
How well do process-based and data-driven hydrological models learn from limited discharge data?
Hydrol. Earth Syst. Sci. 29 (19), 5005 - 5029 10.5194/hess-29-5005-2025