|Title (Primary)||A comparison of calibration sampling schemes at the field scale|
|Author||Schmidt, K.; Behrens, T.; Daumann, J.; Ramirez-Lopez, L.; Werban, U. ; Dietrich, P. ; Scholten, T.|
|Keywords||Weighted Latin hypercube sampling; Fuzzy k-means sampling; Response surface sampling; Soil sensing; Random forest regression|
|UFZ wide themes||TERENO; RU5;|
High-resolution digital soil sensing and mapping is an important and emerging new technology that helps meet the strong and growing global demand for high-resolution soil property data. However, the combination of geophysical sensing and pedometrical techniques to produce soil property maps is complex and requires a well-structured design, from the initial steps of data collection right through to final model validation. In this study, we compare different sampling design strategies – an extension of conditioned Latin hypercube sampling, fuzzy k-means sampling and response surface sampling – as a basis for predicting soil texture, soil organic carbon and soil pH-value at two soil depth intervals using electromagnetic induction (EM38DD and EM31) and gamma spectroscopy (U, K, Th) data. Two different sample set sizes, two different regression approaches (multiple linear least squares and random forests), as well as several resampling and independent validation approaches are compared. In addition to these real-world datasets, we also compared the investigated methods for two comparable simulated datasets.
Our accuracy estimation results reveal that an optimal combination of Latin hypercube sampling and random forest regression should be adopted. This is the case for both the real world examples as well as the two synthetic datasets. The analysis conducted indicates that this stems from optimized spread within the state space of the sensors.
Iterative LHS subsampling with increasing sample set sizes may potentially be a successful approach for incrementally analyzing and validating the model and thus can help reduce laboratory costs when a certain desired accuracy level is achieved.
Comparison between the different validation approaches reveals their complexity and highlights the necessity for adequate independent validation approaches. However, based on the findings of our study, we recommend ‘leave-group-out’ cross-validation and ‘.632 bootstrapping’ as the best estimates to use.
Finally, this study shows that there are complex interactions between sampling design, regression approaches and validation approaches, which can greatly influence the final soil property maps and their accuracy estimates.
Future work should focus on detailed analysis of Latin hypercube sampling and why it outperformed the other approaches. Therefore, comparisons with other sampling approaches should be conducted, as well as specific ‘sampling-for-validation’ approaches. Therefore we provide the simulated datasets as Supplementary reference material for future comparative analysis.
|Persistent UFZ Identifier||https://www.ufz.de/index.php?en=20939&ufzPublicationIdentifier=14900|
|Schmidt, K., Behrens, T., Daumann, J., Ramirez-Lopez, L., Werban, U., Dietrich, P., Scholten, T. (2014):
A comparison of calibration sampling schemes at the field scale
Geoderma 232-234 , 243 - 256