Local LoRA: Memory-Efficient Fine-Tuning of Large Language Models

Abstract

We present Local LoRA, a memory-flexible fine-tuning approach that, in principle, can fine-tune an arbitrarily large model on fixed hardware, including consumer grade GPUs. Our approach aims to decouple the size of the model and the memory required to fine-tune it by dividing the model into chunks and sequentially fine tuning each chunk. Our results show that Local LoRA closes the gap between the un-tuned model and end-to-end LoRA on math reasoning tasks.

Cite

Text

Key et al. "Local LoRA: Memory-Efficient Fine-Tuning of Large Language Models." NeurIPS 2023 Workshops: WANT, 2023.

Markdown

[Key et al. "Local LoRA: Memory-Efficient Fine-Tuning of Large Language Models." NeurIPS 2023 Workshops: WANT, 2023.](https://mlanthology.org/neuripsw/2023/key2023neuripsw-local/)

BibTeX

@inproceedings{key2023neuripsw-local,
  title     = {{Local LoRA: Memory-Efficient Fine-Tuning of Large Language Models}},
  author    = {Key, Oscar and Kaddour, Jean and Minervini, Pasquale},
  booktitle = {NeurIPS 2023 Workshops: WANT},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/key2023neuripsw-local/}
}