Sparse Gradient Compression for Fine-Tuning Large Language Models

Abstract

Fine-tuning large language models (LLMs) for downstream tasks has become increasingly crucial due to their widespread use and the growing availability of open-source models. However, the high memory costs associated with fine-tuning remain a significant challenge, especially as models increase in size. To address this, parameter efficient fine-tuning (PEFT) methods have been proposed to minimize the number of parameters required for fine-tuning LLMs. However, these approaches often tie the number of optimizer states to dimensions of model parameters, limiting flexibility and control during fine-tuning. In this paper, we propose sparse gradient compression (SGC), a training regime that leverages inherent sparsity in gradients to compress optimizer states by projecting them onto a low-dimensonal subspace, with dimensionality independent of the original model's parameters. By enabling optimizer state updates in an arbitrary low-dimensional subspace, SGC offers a flexible tradeoff between memory efficiency and performance. By fine-tuning LLMs on downstream tasks, we show that SGC can deliver superior performance while substantially lowering optimizer state memory requirements, particularly in both data-limited and memory-limited settings.

Cite

Text

Yang et al. "Sparse Gradient Compression for Fine-Tuning Large Language Models." ICLR 2025 Workshops: SLLM, 2025.

Markdown

[Yang et al. "Sparse Gradient Compression for Fine-Tuning Large Language Models." ICLR 2025 Workshops: SLLM, 2025.](https://mlanthology.org/iclrw/2025/yang2025iclrw-sparse/)

BibTeX

@inproceedings{yang2025iclrw-sparse,
  title     = {{Sparse Gradient Compression for Fine-Tuning Large Language Models}},
  author    = {Yang, David H. and Amiri, Mohammad Mohammadi and Pedapati, Tejaswini and Chaudhury, Subhajit and Chen, Pin-Yu},
  booktitle = {ICLR 2025 Workshops: SLLM},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/yang2025iclrw-sparse/}
}