Information Theoretic Approaches for Testing Missingness in Predictive Models

Abstract

In predictive modeling, missing data can often result in learning biased models despite application of imputation approaches. Therefore, it is important to assess the missingness process of the data. We present hypothesis tests for assessing these dependencies: MI-MCAR (mutual information for missing completely at random) and MI-US (mutual information for unobserved sources). MI-MCAR tests marginal independence between the missingness pattern and the the data matrix, while MI-US is a conditional randomization test (CRT) to test the dependence of the missingness pattern on unobserved sources. These methods can be applied to heterogeneous data types and can serve to identify missingness pathologies in data which specifically affect performance for regression tasks. We evaluate our methods on simulated and pseudo-simulated datasets and show that we are able to identify data which suffers from missingness due to unobserved sources.

Cite

Text

Bhave et al. "Information Theoretic Approaches for Testing Missingness in Predictive Models." ICML 2020 Workshops: Artemiss, 2020.

Markdown

[Bhave et al. "Information Theoretic Approaches for Testing Missingness in Predictive Models." ICML 2020 Workshops: Artemiss, 2020.](https://mlanthology.org/icmlw/2020/bhave2020icmlw-information/)

BibTeX

@inproceedings{bhave2020icmlw-information,
  title     = {{Information Theoretic Approaches for Testing Missingness in Predictive Models}},
  author    = {Bhave, Shreyas A and Ranganath, Rajesh and Perotte, Adler},
  booktitle = {ICML 2020 Workshops: Artemiss},
  year      = {2020},
  url       = {https://mlanthology.org/icmlw/2020/bhave2020icmlw-information/}
}