Zeroth-Order Fine-Tuning of LLMs with Transferable Static Sparsity

Abstract

Zeroth-order optimization (ZO) is a memory-efficient strategy for fine-tuning Large Language Models using only forward passes. However, applying ZO fine-tuning in memory-constrained settings such as mobile phones and laptops remains challenging since these settings often involve weight quantization, while ZO requires full-precision perturbation and update. In this study, we address this limitation by combining static sparse ZO fine-tuning with quantization. Our approach transfers a small, static subset (0.1%) of "sensitive" parameters from pre-training to downstream tasks, focusing fine-tuning on this sparse set of parameters. The remaining untuned parameters are quantized, reducing memory demands. Our proposed workflow enables efficient ZO fine-tuning of an Llama2-7B model on a GPU device with less than 8GB of memory while outperforming full model ZO fine-tuning performance and in-context learning.

Cite

Text

Guo et al. "Zeroth-Order Fine-Tuning of LLMs with Transferable Static Sparsity." International Conference on Learning Representations, 2025.

Markdown

[Guo et al. "Zeroth-Order Fine-Tuning of LLMs with Transferable Static Sparsity." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/guo2025iclr-zerothorder/)

BibTeX

@inproceedings{guo2025iclr-zerothorder,
  title     = {{Zeroth-Order Fine-Tuning of LLMs with Transferable Static Sparsity}},
  author    = {Guo, Wentao and Long, Jikai and Zeng, Yimeng and Liu, Zirui and Yang, Xinyu and Ran, Yide and Gardner, Jacob R. and Bastani, Osbert and De Sa, Christopher and Yu, Xiaodong and Chen, Beidi and Xu, Zhaozhuo},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/guo2025iclr-zerothorder/}
}