LoQT: Low-Rank Adapters for Quantized Pretraining

Abstract

Despite advances using low-rank adapters and quantization, pretraining of large models on consumer hardware has not been possible without model sharding, offloading during training, or per-layer gradient updates. To address these limitations, we propose Low-Rank Adapters for Quantized Training (LoQT), a method for efficiently training quantized models. LoQT uses gradient-based tensor factorization to initialize low-rank trainable weight matrices that are periodically merged into quantized full-rank weight matrices. Our approach is suitable for both pretraining and fine-tuning models. We demonstrate this for language modeling and downstream task adaptation, finding that LoQT enables efficient training of models up to 7B parameters on a 24GB GPU. We also demonstrate the feasibility of training a 13B model using per-layer gradient updates on the same hardware.

Cite

Text

Loeschcke et al. "LoQT: Low-Rank Adapters for Quantized Pretraining." Neural Information Processing Systems, 2024. doi:10.52202/079017-3661

Markdown

[Loeschcke et al. "LoQT: Low-Rank Adapters for Quantized Pretraining." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/loeschcke2024neurips-loqt/) doi:10.52202/079017-3661

BibTeX

@inproceedings{loeschcke2024neurips-loqt,
  title     = {{LoQT: Low-Rank Adapters for Quantized Pretraining}},
  author    = {Loeschcke, Sebastian and Toftrup, Mads and Kastoryano, Michael J. and Belongie, Serge and Snæbjarnarson, Vésteinn},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-3661},
  url       = {https://mlanthology.org/neurips/2024/loeschcke2024neurips-loqt/}
}