Parameter-Efficient Transfer Learning for NLP

Abstract

Fine-tuning large pretrained models is an effective transfer mechanism in NLP. However, in the presence of many downstream tasks, fine-tuning is parameter inefficient: an entire new model is required for every task. As an alternative, we propose transfer with adapter modules. Adapter modules yield a compact and extensible model; they add only a few trainable parameters per task, and new tasks can be added without revisiting previous ones. The parameters of the original network remain fixed, yielding a high degree of parameter sharing. To demonstrate adapter’s effectiveness, we transfer the recently proposed BERT Transformer model to $26$ diverse text classification tasks, including the GLUE benchmark. Adapters attain near state-of-the-art performance, whilst adding only a few parameters per task. On GLUE, we attain within $0.8%$ of the performance of full fine-tuning, adding only $3.6%$ parameters per task. By contrast, fine-tuning trains $100%$ of the parameters per task.

Cite

Text

Houlsby et al. "Parameter-Efficient Transfer Learning for NLP." International Conference on Machine Learning, 2019.

Markdown

[Houlsby et al. "Parameter-Efficient Transfer Learning for NLP." International Conference on Machine Learning, 2019.](https://mlanthology.org/icml/2019/houlsby2019icml-parameterefficient/)

BibTeX

@inproceedings{houlsby2019icml-parameterefficient,
  title     = {{Parameter-Efficient Transfer Learning for NLP}},
  author    = {Houlsby, Neil and Giurgiu, Andrei and Jastrzebski, Stanislaw and Morrone, Bruna and De Laroussilhe, Quentin and Gesmundo, Andrea and Attariyan, Mona and Gelly, Sylvain},
  booktitle = {International Conference on Machine Learning},
  year      = {2019},
  pages     = {2790-2799},
  volume    = {97},
  url       = {https://mlanthology.org/icml/2019/houlsby2019icml-parameterefficient/}
}