Avoiding Spurious Correlations: Bridging Theory and Practice

Abstract

Distribution shifts in the wild jeopardize the performance of machine learning models as they tend to pick up spurious correlations during training. Recent work (Nagarajan et al., 2020) has characterized two specific failure modes of out-of-distribution (OOD) generalization, and we extend this theoretical framework by interpreting existing algorithms as solutions to these failure modes. We then evaluate them on different image classification datasets, and in the process surface two issues that are central to existing robustness techniques. For the algorithms that require access to group information, we demonstrate how the existing annotations included in standard OOD benchmarks are unable to fully capture the spurious correlations present. For methods that don't rely on group annotations during training, the validation set they utilize for model selection carries assumptions that are not realistic in real-world settings. This leads us to explore how the choice of distribution shifts represented by validation data would affect the effectiveness of different OOD robustness algorithms.

Cite

Text

Nguyen et al. "Avoiding Spurious Correlations: Bridging Theory and Practice." NeurIPS 2021 Workshops: DistShift, 2021.

Markdown

[Nguyen et al. "Avoiding Spurious Correlations: Bridging Theory and Practice." NeurIPS 2021 Workshops: DistShift, 2021.](https://mlanthology.org/neuripsw/2021/nguyen2021neuripsw-avoiding/)

BibTeX

@inproceedings{nguyen2021neuripsw-avoiding,
  title     = {{Avoiding Spurious Correlations: Bridging Theory and Practice}},
  author    = {Nguyen, Thao and Nagarajan, Vaishnavh and Sedghi, Hanie and Neyshabur, Behnam},
  booktitle = {NeurIPS 2021 Workshops: DistShift},
  year      = {2021},
  url       = {https://mlanthology.org/neuripsw/2021/nguyen2021neuripsw-avoiding/}
}