Memory-Efficient Selective Fine-Tuning

Antoine Simoulin, Namyong Park, Xiaoyi Liu, Grey Yang

ICMLW 2023

/icmlw/2023/simoulin2023icmlw-memoryefficient/

Abstract

We propose an approach for reducing the memory required to fine-tune transformer-based models. During the backward pass, our approach only propagates the gradient through a small number of input positions, while freezing the others. Thus, we only save a subset of the intermediate activations during the forward pass, for which the computed gradient will not be zero. We show that our approach leads to performance on-par with full fine-tuning, while requiring only up to a third of the GPU memory. Our approach is specifically efficient in fine-tuning language models with a number of parameters lying around hundred of millions. It allows to fine-tune such models on consumer hardware, while maintaining a large batch size.

PDF ICMLW OpenReview Semantic Scholar

Cite

Text

Simoulin et al. "Memory-Efficient Selective Fine-Tuning." ICML 2023 Workshops: ES-FoMO, 2023.

Markdown

[Simoulin et al. "Memory-Efficient Selective Fine-Tuning." ICML 2023 Workshops: ES-FoMO, 2023.](https://mlanthology.org/icmlw/2023/simoulin2023icmlw-memoryefficient/)

BibTeX

@inproceedings{simoulin2023icmlw-memoryefficient,
  title     = {{Memory-Efficient Selective Fine-Tuning}},
  author    = {Simoulin, Antoine and Park, Namyong and Liu, Xiaoyi and Yang, Grey},
  booktitle = {ICML 2023 Workshops: ES-FoMO},
  year      = {2023},
  url       = {https://mlanthology.org/icmlw/2023/simoulin2023icmlw-memoryefficient/}
}