Robust Fine-Tuning of Zero-Shot Models
Abstract
Large pre-trained models such as CLIP or ALIGN offer consistent accuracy across a range of data distributions when performing zero-shot inference (i.e., without fine-tuning on a specific dataset). Although existing fine-tuning approaches substantially improve accuracy in-distribution, they often reduce out-of-distribution robustness. We address this tension by introducing a simple and effective method for improving robustness: ensembling the weights of the zero-shot and fine-tuned models (WiSE-FT). Compared to standard fine-tuning, WiSE-FT provides large accuracy improvements out-of-distribution, while preserving high in-distribution accuracy. On ImageNet (in-distribution) and five derived distribution shifts, WiSE-FT improves out-of-distribution accuracy by 4 to 6 percentage points (pp) over prior work while increasing in-distribution accuracy by 1.6 pp. WiSE-FT achieves similarly large robustness improvements (2 to 23 pp) on a diverse set of six further distribution shifts, and in-distribution accuracy gains of 0.8 to 3.3 pp compared to standard fine-tuning on seven commonly used transfer learning datasets. These improvements come at no additional computational cost during fine-tuning or inference.
Cite
Text
Wortsman et al. "Robust Fine-Tuning of Zero-Shot Models." NeurIPS 2021 Workshops: DistShift, 2021.Markdown
[Wortsman et al. "Robust Fine-Tuning of Zero-Shot Models." NeurIPS 2021 Workshops: DistShift, 2021.](https://mlanthology.org/neuripsw/2021/wortsman2021neuripsw-robust/)BibTeX
@inproceedings{wortsman2021neuripsw-robust,
title = {{Robust Fine-Tuning of Zero-Shot Models}},
author = {Wortsman, Mitchell and Ilharco, Gabriel and Kim, Jong Wook and Li, Mike and Kornblith, Simon and Roelofs, Rebecca and Gontijo-Lopes, Raphael and Hajishirzi, Hanna and Farhadi, Ali and Namkoong, Hongseok and Schmidt, Ludwig},
booktitle = {NeurIPS 2021 Workshops: DistShift},
year = {2021},
url = {https://mlanthology.org/neuripsw/2021/wortsman2021neuripsw-robust/}
}