LoRA Training in the NTK Regime Has No Spurious Local Minima

Abstract

Low-rank adaptation (LoRA) has become the standard approach for parameter-efficient fine-tuning of large language models (LLM), but our theoretical understanding of LoRA has been limited. In this work, we theoretically analyze LoRA fine-tuning in the neural tangent kernel (NTK) regime with $N$ data points, showing: (i) full fine-tuning (without LoRA) admits a low-rank solution of rank $r\lesssim \sqrt{N}$; (ii) using LoRA with rank $r\gtrsim \sqrt{N}$ eliminates spurious local minima, allowing gradient descent to find the low-rank solutions; (iii) the low-rank solution found using LoRA generalizes well.

Cite

Text

Jang et al. "LoRA Training in the NTK Regime Has No Spurious Local Minima." International Conference on Machine Learning, 2024.

Markdown

[Jang et al. "LoRA Training in the NTK Regime Has No Spurious Local Minima." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/jang2024icml-lora/)

BibTeX

@inproceedings{jang2024icml-lora,
  title     = {{LoRA Training in the NTK Regime Has No Spurious Local Minima}},
  author    = {Jang, Uijeong and Lee, Jason D. and Ryu, Ernest K.},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {21306-21328},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/jang2024icml-lora/}
}