Calibrated Ensembles Can Mitigate Accuracy Tradeoffs Under Distribution Shift

Abstract

We often see undesirable tradeoffs in robust machine learning where out-of-distribution (OOD) accuracy is at odds with in-distribution (ID) accuracy. A robust classifier obtained via specialized techniques such as removing spurious features often has better OOD but worse ID accuracy compared to a standard classifier trained via vanilla ERM. In this paper, we find that a simple approach of ensembling the standard and robust models, after calibrating on only ID data, outperforms prior state-of-the-art both ID and OOD. On ten natural distribution shift datasets, ID-calibrated ensembles get the best of both worlds: strong ID accuracy of the standard model and OOD accuracy of the robust model. We analyze this method in stylized settings, and identify two important conditions for ensembles to perform well on both ID and OOD: (1) standard and robust models should be calibrated (on ID data, because OOD data is unavailable), (2) OOD has no anticorrelated spurious features.

Cite

Text

Kumar et al. "Calibrated Ensembles Can Mitigate Accuracy Tradeoffs Under Distribution Shift." Uncertainty in Artificial Intelligence, 2022.

Markdown

[Kumar et al. "Calibrated Ensembles Can Mitigate Accuracy Tradeoffs Under Distribution Shift." Uncertainty in Artificial Intelligence, 2022.](https://mlanthology.org/uai/2022/kumar2022uai-calibrated/)

BibTeX

@inproceedings{kumar2022uai-calibrated,
  title     = {{Calibrated Ensembles Can Mitigate Accuracy Tradeoffs Under Distribution Shift}},
  author    = {Kumar, Ananya and Ma, Tengyu and Liang, Percy and Raghunathan, Aditi},
  booktitle = {Uncertainty in Artificial Intelligence},
  year      = {2022},
  pages     = {1041-1051},
  volume    = {180},
  url       = {https://mlanthology.org/uai/2022/kumar2022uai-calibrated/}
}