GAIN: Missing Data Imputation Using Generative Adversarial Nets

Abstract

We propose a novel method for imputing missing data by adapting the well-known Generative Adversarial Nets (GAN) framework. Accordingly, we call our method Generative Adversarial Imputation Nets (GAIN). The generator (G) observes some components of a real data vector, imputes the missing components conditioned on what is actually observed, and outputs a completed vector. The discriminator (D) then takes a completed vector and attempts to determine which components were actually observed and which were imputed. To ensure that D forces G to learn the desired distribution, we provide D with some additional information in the form of a hint vector. The hint reveals to D partial information about the missingness of the original sample, which is used by D to focus its attention on the imputation quality of particular components. This hint ensures that G does in fact learn to generate according to the true data distribution. We tested our method on various datasets and found that GAIN significantly outperforms state-of-the-art imputation methods.

Cite

Text

Yoon et al. "GAIN: Missing Data Imputation Using Generative Adversarial Nets." International Conference on Machine Learning, 2018.

Markdown

[Yoon et al. "GAIN: Missing Data Imputation Using Generative Adversarial Nets." International Conference on Machine Learning, 2018.](https://mlanthology.org/icml/2018/yoon2018icml-gain/)

BibTeX

@inproceedings{yoon2018icml-gain,
  title     = {{GAIN: Missing Data Imputation Using Generative Adversarial Nets}},
  author    = {Yoon, Jinsung and Jordon, James and Schaar, Mihaela},
  booktitle = {International Conference on Machine Learning},
  year      = {2018},
  pages     = {5689-5698},
  volume    = {80},
  url       = {https://mlanthology.org/icml/2018/yoon2018icml-gain/}
}