Shift and Scale Is Detrimental to Few-Shot Transfer

Abstract

Batch normalization is a common component in computer vision models, including ones typically used for few-shot learning. Batch normalization applied in convolutional networks consists of a normalization step, followed by the application of per-channel trainable affine parameters which shift and scale the normalized features. The use of these affine parameters can speed up model convergence on a source task. However, we demonstrate in this work that, on common few-shot learning benchmarks, training a model on a source task using these affine parameters is detrimental to downstream transfer performance. We study this effect for several methods on well-known benchmarks such as cross-domain few-shot learning (CD-FSL) benchmark and few-shot image classification on miniImageNet. We find consistent performance gains, particularly in settings with more distant transfer tasks. Improvements from applying this low-cost and easy-to-implement modifications are shown to rival gains obtained by more sophisticated and costly methods.

Cite

Text

Yazdanpanah et al. "Shift and Scale Is Detrimental to Few-Shot Transfer." NeurIPS 2021 Workshops: DistShift, 2021.

Markdown

[Yazdanpanah et al. "Shift and Scale Is Detrimental to Few-Shot Transfer." NeurIPS 2021 Workshops: DistShift, 2021.](https://mlanthology.org/neuripsw/2021/yazdanpanah2021neuripsw-shift/)

BibTeX

@inproceedings{yazdanpanah2021neuripsw-shift,
  title     = {{Shift and Scale Is Detrimental to Few-Shot Transfer}},
  author    = {Yazdanpanah, Moslem and Rahman, Aamer Abdul and Desrosiers, Christian and Havaei, Mohammad and Belilovsky, Eugene and Kahou, Samira Ebrahimi},
  booktitle = {NeurIPS 2021 Workshops: DistShift},
  year      = {2021},
  url       = {https://mlanthology.org/neuripsw/2021/yazdanpanah2021neuripsw-shift/}
}