Asymmetry in Low-Rank Adapters of Foundation Models

Abstract

Parameter-efficient fine-tuning optimizes large, pre-trained foundation models by updating a subset of parameters; in this class, Low-Rank Adaptation (LoRA) is particularly effective. Inspired by an effort to investigate the different roles of LoRA matrices during fine-tuning, this paper characterizes and leverages unexpected asymmetry in the importance of low-rank adapter matrices. Specifically, when updating the parameter matrices of a neural network by adding a product $BA$, we observe that the $B$ and $A$ matrices have distinct functions: $A$ extracts features from the input, while $B$ uses these features to create the desired output. Based on this observation, we demonstrate that fine-tuning $B$ is inherently more effective than fine-tuning $A$ and that a random untrained $A$ should perform nearly as well as a fine-tuned one. Using an information-theoretic lens, we also bound generalization of low-rank adapters, showing that the parameter savings of exclusively training $B$ improves the bound. We support our conclusions with experiments on RoBERTa, BART, LLaMA-2, and ViT.

Cite

Text

Zhu et al. "Asymmetry in Low-Rank Adapters of Foundation Models." ICLR 2024 Workshops: ME-FoMo, 2024.

Markdown

[Zhu et al. "Asymmetry in Low-Rank Adapters of Foundation Models." ICLR 2024 Workshops: ME-FoMo, 2024.](https://mlanthology.org/iclrw/2024/zhu2024iclrw-asymmetry/)

BibTeX

@inproceedings{zhu2024iclrw-asymmetry,
  title     = {{Asymmetry in Low-Rank Adapters of Foundation Models}},
  author    = {Zhu, Jiacheng and Greenewald, Kristjan and Nadjahi, Kimia and de Ocáriz Borde, Haitz Sáez and Gabrielsson, Rickard Brüel and Choshen, Leshem and Ghassemi, Marzyeh and Yurochkin, Mikhail and Solomon, Justin},
  booktitle = {ICLR 2024 Workshops: ME-FoMo},
  year      = {2024},
  url       = {https://mlanthology.org/iclrw/2024/zhu2024iclrw-asymmetry/}
}