Learning Invariant Representations with Missing Data

Abstract

Spurious correlations, or *shortcuts*, allow flexible models to predict well during training but poorly on related test populations. Recent work has shown that models that satisfy particular independencies involving the correlation-inducing *nuisance* variable have guarantees on their test performance. However, enforcing such independencies requires nuisances to be observed during training. But nuisances such as demographics or image background labels are often missing. Enforcing independence on just the observed data does not imply independence on the entire population. In this work, we derive the missing-mmd estimator used for invariance objectives under missing nuisances. On simulations and clinical data, missing-mmds enable improvements in test performance similar to those achieved by using fully-observed data.

Cite

Text

Goldstein et al. "Learning Invariant Representations with Missing Data." Proceedings of the First Conference on Causal Learning and Reasoning, 2022.

Markdown

[Goldstein et al. "Learning Invariant Representations with Missing Data." Proceedings of the First Conference on Causal Learning and Reasoning, 2022.](https://mlanthology.org/clear/2022/goldstein2022clear-learning/)

BibTeX

@inproceedings{goldstein2022clear-learning,
  title     = {{Learning Invariant Representations with Missing Data}},
  author    = {Goldstein, Mark and Jacobsen, Joern-Henrik and Chau, Olina and Saporta, Adriel and Puli, Aahlad Manas and Ranganath, Rajesh and Miller, Andrew},
  booktitle = {Proceedings of the First Conference on Causal Learning and Reasoning},
  year      = {2022},
  pages     = {290-301},
  volume    = {177},
  url       = {https://mlanthology.org/clear/2022/goldstein2022clear-learning/}
}