PaZO: Preconditioned Accelerated Zeroth-Order Optimization for Fine-Tuning LLMs

Abstract

This paper introduces PaZO, a preconditioned accelerated zeroth-order optimization algorithm for fine-tuning large language models (LLMs). First, we theoretically demonstrate the necessity of preconditioning in zeroth-order optimization, proving that zeroth-order stochastic gradient descent (ZO-SGD) alone fails to achieve the ideal convergence rate. Building on this, we propose a Preconditioned Simultaneous Perturbation Stochastic Approximation (PSPSA) and theoretical version of PaZO, and demonstrate that setting the order of preconditioner as $-1/2$ in PSPSA yields the improved convergence rate for PaZO. Moreover, we design a practical version of PaZO that stabilizes training via diagonal Hessian estimate and moving average technique. Extensive experiments on diverse downstream tasks with models like RoBERTa-large and OPT show PaZO’s effectiveness. Compared to other zeroth-order baselines, PaZO achieves better performance across models and tasks.

Cite

Text

Zhao et al. "PaZO: Preconditioned Accelerated Zeroth-Order Optimization for Fine-Tuning LLMs." Advances in Neural Information Processing Systems, 2025.

Markdown

[Zhao et al. "PaZO: Preconditioned Accelerated Zeroth-Order Optimization for Fine-Tuning LLMs." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/zhao2025neurips-pazo/)

BibTeX

@inproceedings{zhao2025neurips-pazo,
  title     = {{PaZO: Preconditioned Accelerated Zeroth-Order Optimization for Fine-Tuning LLMs}},
  author    = {Zhao, Hanzhen and Ding, Shihong and Fang, Cong and Lin, Zhouchen},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/zhao2025neurips-pazo/}
}