Publication Details

Category Text Publication
Reference Category Journals
DOI 10.1111/geb.13695
Licence creative commons licence
Title (Primary) Imputing missing data in plant traits: A guide to improve gap-filling
Author Joswig, J.S.; Kattge, J.; Kraemer, G.; Mahecha, M.D.; Rüger, N.; Schaepman, M.E.; Schrodt, F.; Schuman, M.C.
Source Titel Global Ecology and Biogeography
Year 2023
Department iDiv; RS
Volume 32
Issue 8
Page From 1395
Page To 1408
Language englisch
Topic T5 Future Landscapes
Data and Software links https://doi.org/10.17871/TRY.96
Supplements https://onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1111%2Fgeb.13695&file=geb13695-sup-0001-AppendixS1.zip
Keywords Bayesian hierarchical model; gap-filling; imputation; induced pattern; machine learning; matrix factorization; plant functional trait; sensitivity analysis sparse matrix TRY
Abstract

Aim

Globally distributed plant trait data are increasingly used to understand relationships between biodiversity and ecosystem processes. However, global trait databases are sparse because they are compiled from many, mostly small databases. This sparsity in both trait space completeness and geographical distribution limits the potential for both multivariate and global analyses. Thus, ‘gap-filling’ approaches are often used to impute missing trait data. Recent methods, like Bayesian hierarchical probabilistic matrix factorization (BHPMF), can impute large and sparse data sets using side information. We investigate whether BHPMF imputation leads to biases in trait space and identify aspects influencing bias to provide guidance for its usage.

Innovation

We use a fully observed trait data set from which entries are randomly removed, along with extensive but sparse additional data. We use BHPMF for imputation and evaluate bias by: (1) accuracy (residuals, RMSE, trait means), (2) correlations (bi- and multivariate) and (3) taxonomic and functional clustering (valuewise, uni- and multivariate). BHPMF preserves general patterns of trait distributions but induces taxonomic clustering. Data set–external trait data had little effect on induced taxonomic clustering and stabilized trait–trait correlations.

Main Conclusions

Our study extends the criteria for the evaluation of gap-filling beyond RMSE, providing insight into statistical data structure and allowing better informed use of imputed trait data, with improved practice for imputation. We expect our findings to be valuable beyond applications in plant ecology, for any study using hierarchical side information for imputation.

Persistent UFZ Identifier https://www.ufz.de/index.php?en=20939&ufzPublicationIdentifier=27091
Joswig, J.S., Kattge, J., Kraemer, G., Mahecha, M.D., Rüger, N., Schaepman, M.E., Schrodt, F., Schuman, M.C. (2023):
Imputing missing data in plant traits: A guide to improve gap-filling
Glob. Ecol. Biogeogr. 32 (8), 1395 - 1408 10.1111/geb.13695