Details zur Publikation

Kategorie Textpublikation
Referenztyp Buchkapitel
DOI 10.1109/BigData55660.2022.10020732
Titel (primär) Preserving cluster features in imputing high dimensional data with extensive missing rate
Titel (sekundär) 2022 IEEE International Conference on Big Data (Big Data), Osaka, Japan, 17-20 December 2022
Autor Lai, C.; Poschen, C.; Steinheuer, L.M.; Hackermüller, J. ORCID logo
Erscheinungsjahr 2022
Department MOLSYB; BIOINF
Seite von 5295
Seite bis 5304
Sprache englisch
Topic T9 Healthy Planet
Keywords missing data; data imputation; support vector machine; convolution operation; neural network; global/local features; clustering
Abstract It is a daunting task to impute a large dataset that has majority of data missing from its tens of thousands of predictors, but still to preserve a known cluster structure in the imputed results. In this study, we propose a novel two-step approach for this task. First, we use simple linear classification models to derive the global and local features of cluster structures from a template ground truth (i.e. known gold data). Second, we integrate the cluster features extracted from each cluster into a neural network architecture and its training responses for guiding the imputation process. Since our neural network utilizes the global and local features of gold data in training the imputation network, we refer our neural network as GLIN (Global-Local Imputation Network). We test our imputation method on two high-dimensional datasets: a single cell dataset and a movie rating dataset, that have up to of 95% missing rates. Finally, we use four evaluation metrics: distance, correlation, data distribution, and predictability difference, to evaluate how well the cluster structures of the gold data are preserved in the imputed results.
dauerhafte UFZ-Verlinkung https://www.ufz.de/index.php?en=20939&ufzPublicationIdentifier=28048
Lai, C., Poschen, C., Steinheuer, L.M., Hackermüller, J. (2022):
Preserving cluster features in imputing high dimensional data with extensive missing rate
2022 IEEE International Conference on Big Data (Big Data), Osaka, Japan, 17-20 December 2022
Institute of Electrical and Electronics Engineers (IEEE), New York, NY, p. 5295 - 5304 10.1109/BigData55660.2022.10020732