Make Pre-Trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning

NeurIPS 2023

/neurips/2023/liao2023neurips-make/

Abstract

Parameter-efficient fine-tuning (PEFT) of pre-trained language models (PLMs) has emerged as a highly successful approach, with training only a small number of parameters without sacrificing performance and becoming the de-facto learning paradigm with the increasing size of PLMs. However, existing PEFT methods are not memory-efficient, because they still require caching most of the intermediate activations for the gradient calculation, akin to fine-tuning. One effective way to reduce the activation memory is to apply a reversible model, so the intermediate activations are not necessary to be cached and can be recomputed. Nevertheless, modifying a PLM to its reversible variant is not straightforward, since the reversible model has a distinct architecture from the currently released PLMs. In this paper, we first investigate what is a key factor for the success of existing PEFT methods, and realize that it's essential to preserve the PLM's starting point when initializing a PEFT method. With this finding, we propose memory-efficient fine-tuning (MEFT) that inserts adapters into a PLM, preserving the PLM's starting point and making it reversible without additional pre-training. We evaluate MEFT on the GLUE benchmark and five question-answering tasks with various backbones, BERT, RoBERTa, BART and OPT. MEFT significantly reduces the activation memory up to 84% of full fine-tuning with a negligible amount of trainable parameters. Moreover, MEFT achieves the same score on GLUE and a comparable score on the question-answering tasks as full fine-tuning. A similar finding is also observed for the image classification task.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Liao et al. "Make Pre-Trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning." Neural Information Processing Systems, 2023.

Markdown

[Liao et al. "Make Pre-Trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/liao2023neurips-make/)

BibTeX

@inproceedings{liao2023neurips-make,
  title     = {{Make Pre-Trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning}},
  author    = {Liao, Baohao and Tan, Shaomu and Monz, Christof},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/liao2023neurips-make/}
}