Publication Details

Category Text Publication
Reference Category Book chapters
DOI 10.1109/BigData55660.2022.10020732
Title (Primary) Preserving cluster features in imputing high dimensional data with extensive missing rate
Title (Secondary) 2022 IEEE International Conference on Big Data (Big Data), Osaka, Japan, 17-20 December 2022
Author Lai, C.; Poschen, C.; Steinheuer, L.M.; Hackermüller, J. ORCID logo
Year 2022
Department MOLSYB; BIOINF
Page From 5295
Page To 5304
Language englisch
Topic T9 Healthy Planet
Keywords missing data; data imputation; support vector machine; convolution operation; neural network; global/local features; clustering
Abstract It is a daunting task to impute a large dataset that has majority of data missing from its tens of thousands of predictors, but still to preserve a known cluster structure in the imputed results. In this study, we propose a novel two-step approach for this task. First, we use simple linear classification models to derive the global and local features of cluster structures from a template ground truth (i.e. known gold data). Second, we integrate the cluster features extracted from each cluster into a neural network architecture and its training responses for guiding the imputation process. Since our neural network utilizes the global and local features of gold data in training the imputation network, we refer our neural network as GLIN (Global-Local Imputation Network). We test our imputation method on two high-dimensional datasets: a single cell dataset and a movie rating dataset, that have up to of 95% missing rates. Finally, we use four evaluation metrics: distance, correlation, data distribution, and predictability difference, to evaluate how well the cluster structures of the gold data are preserved in the imputed results.
Persistent UFZ Identifier https://www.ufz.de/index.php?en=20939&ufzPublicationIdentifier=28048
Lai, C., Poschen, C., Steinheuer, L.M., Hackermüller, J. (2022):
Preserving cluster features in imputing high dimensional data with extensive missing rate
2022 IEEE International Conference on Big Data (Big Data), Osaka, Japan, 17-20 December 2022
Institute of Electrical and Electronics Engineers (IEEE), New York, NY, p. 5295 - 5304 10.1109/BigData55660.2022.10020732