An Empirical Study of Pre-Trained Vision Models on Out-of-Distribution Generalization
Abstract
Generalizing to out-of-distribution (OOD) data -- that is, data from domains unseen during training -- is a key challenge in modern machine learning, which has only recently received much attention. Some existing approaches propose leveraging larger models and pre-training on larger datasets. In this paper, we provide new insights in applying these approaches. Concretely, we show that larger models and larger datasets need to be simultaneously leveraged to improve OOD performance on image classification. Moreover, we show that using smaller learning rates during fine-tuning is critical to achieving good results, contrary to popular intuition that larger learning rates generalize better when training from scratch. We show that strategies that improve in-distribution accuracy may, counter-intuitively, lead to poor OOD performance despite strong in-distribution performance. Our insights culminate to a method that achieves state-of-the-art results on a number of OOD generalization benchmark tasks, often by a significant margin.
Cite
Text
Yu et al. "An Empirical Study of Pre-Trained Vision Models on Out-of-Distribution Generalization." NeurIPS 2021 Workshops: DistShift, 2021.Markdown
[Yu et al. "An Empirical Study of Pre-Trained Vision Models on Out-of-Distribution Generalization." NeurIPS 2021 Workshops: DistShift, 2021.](https://mlanthology.org/neuripsw/2021/yu2021neuripsw-empirical/)BibTeX
@inproceedings{yu2021neuripsw-empirical,
title = {{An Empirical Study of Pre-Trained Vision Models on Out-of-Distribution Generalization}},
author = {Yu, Yaodong and Jiang, Heinrich and Bahri, Dara and Mobahi, Hossein and Kim, Seungyeon and Rawat, Ankit Singh and Veit, Andreas and Ma, Yi},
booktitle = {NeurIPS 2021 Workshops: DistShift},
year = {2021},
url = {https://mlanthology.org/neuripsw/2021/yu2021neuripsw-empirical/}
}