Surgical Fine-Tuning Improves Adaptation to Distribution Shifts
Abstract
A common approach to transfer learning under distribution shift is to fine-tune the last few layers of a pre-trained model, preserving learned features while also adapting to the new task. This paper shows that in such settings, selectively fine-tuning a subset of layers (which we term surgical fine-tuning) matches or outperforms commonly used fine-tuning approaches. Moreover, the type of distribution shift influences which subset is more effective to tune: for example, for image corruptions, fine-tuning only the first few layers works best. We validate our findings systematically across seven real-world data tasks spanning three types of distribution shifts. Theoretically, we prove that for two-layer neural networks in an idealized setting, first-layer tuning can outperform fine-tuning all layers. Intuitively, fine-tuning more parameters on a small target dataset can cause information learned during pre-training to be forgotten, and the relevant information depends on the type of shift.
Cite
Text
Lee et al. "Surgical Fine-Tuning Improves Adaptation to Distribution Shifts." NeurIPS 2022 Workshops: ICBINB, 2022.Markdown
[Lee et al. "Surgical Fine-Tuning Improves Adaptation to Distribution Shifts." NeurIPS 2022 Workshops: ICBINB, 2022.](https://mlanthology.org/neuripsw/2022/lee2022neuripsw-surgical-a/)BibTeX
@inproceedings{lee2022neuripsw-surgical-a,
title = {{Surgical Fine-Tuning Improves Adaptation to Distribution Shifts}},
author = {Lee, Yoonho and Chen, Annie S and Tajwar, Fahim and Kumar, Ananya and Yao, Huaxiu and Liang, Percy and Finn, Chelsea},
booktitle = {NeurIPS 2022 Workshops: ICBINB},
year = {2022},
url = {https://mlanthology.org/neuripsw/2022/lee2022neuripsw-surgical-a/}
}