Leveraging Variational Autoencoders for Multiple Data Imputation

Abstract

Missing data persists as a major barrier to data analysis across numerous applications. Recently, deep generative models have been used for imputation of missing data, motivated by their ability to learn complex and non-linear relationships. In this work, we investigate the ability of variational autoencoders (VAEs) to account for uncertainty in missing data through multiple imputation. We find that VAEs provide poor empirical coverage of missing data, with underestimation and overconfident imputations. To overcome this, we employ $\beta $ β -VAEs, which viewed from a generalized Bayes framework, provide robustness to model misspecification. Assigning a good value of $\beta $ β is critical for uncertainty calibration and we demonstrate how this can be achieved using cross-validation. We assess three alternative methods for sampling from the posterior distribution of missing values and apply the approach to transcriptomics datasets with various simulated missingness scenarios. Finally, we show that single imputation in transcriptomic data can cause false discoveries in downstream tasks and employing multiple imputation with $\beta $ β -VAEs can effectively mitigate these inaccuracies.

Cite

Text

Roskams-Hieter et al. "Leveraging Variational Autoencoders for Multiple Data Imputation." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023. doi:10.1007/978-3-031-43412-9_29

Markdown

[Roskams-Hieter et al. "Leveraging Variational Autoencoders for Multiple Data Imputation." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023.](https://mlanthology.org/ecmlpkdd/2023/roskamshieter2023ecmlpkdd-leveraging/) doi:10.1007/978-3-031-43412-9_29

BibTeX

@inproceedings{roskamshieter2023ecmlpkdd-leveraging,
  title     = {{Leveraging Variational Autoencoders for Multiple Data Imputation}},
  author    = {Roskams-Hieter, Breeshey and Wells, Jude and Wade, Sara},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2023},
  pages     = {491-506},
  doi       = {10.1007/978-3-031-43412-9_29},
  url       = {https://mlanthology.org/ecmlpkdd/2023/roskamshieter2023ecmlpkdd-leveraging/}
}