Publication Details |
Category | Text Publication |
Reference Category | Book chapters |
DOI | 10.1109/BigData55660.2022.10020732 |
Title (Primary) | Preserving cluster features in imputing high dimensional data with extensive missing rate |
Title (Secondary) | 2022 IEEE International Conference on Big Data (Big Data), Osaka, Japan, 17-20 December 2022 |
Author | Lai, C.; Poschen, C.; Steinheuer, L.M.; Hackermüller, J.
![]() |
Year | 2022 |
Department | MOLSYB; BIOINF |
Page From | 5295 |
Page To | 5304 |
Language | englisch |
Topic | T9 Healthy Planet |
Keywords | missing data; data imputation; support vector machine; convolution operation; neural network; global/local features; clustering |
Abstract | It is a daunting task to impute a large dataset that has majority of data missing from its tens of thousands of predictors, but still to preserve a known cluster structure in the imputed results. In this study, we propose a novel two-step approach for this task. First, we use simple linear classification models to derive the global and local features of cluster structures from a template ground truth (i.e. known gold data). Second, we integrate the cluster features extracted from each cluster into a neural network architecture and its training responses for guiding the imputation process. Since our neural network utilizes the global and local features of gold data in training the imputation network, we refer our neural network as GLIN (Global-Local Imputation Network). We test our imputation method on two high-dimensional datasets: a single cell dataset and a movie rating dataset, that have up to of 95% missing rates. Finally, we use four evaluation metrics: distance, correlation, data distribution, and predictability difference, to evaluate how well the cluster structures of the gold data are preserved in the imputed results. |
Persistent UFZ Identifier | https://www.ufz.de/index.php?en=20939&ufzPublicationIdentifier=28048 |
Lai, C., Poschen, C., Steinheuer, L.M., Hackermüller, J. (2022): Preserving cluster features in imputing high dimensional data with extensive missing rate 2022 IEEE International Conference on Big Data (Big Data), Osaka, Japan, 17-20 December 2022 Institute of Electrical and Electronics Engineers (IEEE), New York, NY, p. 5295 - 5304 10.1109/BigData55660.2022.10020732 |