Ask Your Distribution Shift if Pre-Training Is Right for You

Abstract

Pre-training is a widely used approach to develop models that are robust to distribution shifts. However, in practice, its effectiveness varies: fine-tuning a pre-trained model improves robustness significantly in some cases but *not at all* in others (compared to training from scratch). In this work, we seek to characterize the failure modes that pre-training *can* and *cannot* address. In particular, we focus on two possible failure modes of models under distribution shift: poor extrapolation (e.g., they cannot generalize to a different domain) and biases in the training data (e.g., they rely on spurious features). Our study suggests that, as a rule of thumb, pre-training can help mitigate poor extrapolation but not dataset biases. After providing theoretical motivation and empirical evidence for this finding, we explore an implication for developing robust models: fine-tuning on a (very) small, non-diverse but *de-biased* dataset can result in significantly more robust models than fine-tuning on a large and diverse but biased dataset.

Cite

Text

Cohen-Wang et al. "Ask Your Distribution Shift if Pre-Training Is Right for You." NeurIPS 2023 Workshops: DistShift, 2023.

Markdown

[Cohen-Wang et al. "Ask Your Distribution Shift if Pre-Training Is Right for You." NeurIPS 2023 Workshops: DistShift, 2023.](https://mlanthology.org/neuripsw/2023/cohenwang2023neuripsw-ask/)

BibTeX

@inproceedings{cohenwang2023neuripsw-ask,
  title     = {{Ask Your Distribution Shift if Pre-Training Is Right for You}},
  author    = {Cohen-Wang, Benjamin and Vendrow, Joshua and Madry, Aleksander},
  booktitle = {NeurIPS 2023 Workshops: DistShift},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/cohenwang2023neuripsw-ask/}
}