FZOO: Fast Zeroth-Order Optimizer for Fine‑Tuning Large Language Models Towards Adam‑Scale Speed
Abstract
Fine-tuning large language models (LLMs) often faces GPU memory bottlenecks: the backward pass of first-order optimizers like Adam increases memory usage to more than 10 times the inference level (e.g., 633~GB for OPT-30B). Zeroth-order (ZO) optimizers avoid this cost by estimating gradients only from forward passes, yet existing methods like MeZO usually need tens of times more steps to converge. Can this trade-off between speed and memory in ZO be fundamentally improved? Normalized-SGD, for instance, demonstrates strong empirical performance with greater memory efficiency than Adam. In light of this, we introduce FZOO, a Fast Zeroth-Order Optimizer towards Adam-Scale Speed. On the one hand, FZOO reduces the total forward passes needed for convergence by employing batched one-sided estimates that adapt step-sizes based on the standard deviation of batch losses. On the other hand, it accelerates per-batch computation through the use of Rademacher random vector (±1) perturbations, which also enables further speedups through batched evaluation. Extensive experiments on diverse models (including RoBERTa-large, the OPT family (350M-66B), Phi-2, and Llama3) across 11 varied downstream tasks validate FZOO's effectiveness. On average, FZOO outperforms MeZO by +3% in accuracy while requiring 3$\times$fewer forward passes. Notably, for the RoBERTa-large model, FZOO achieves average improvements of +5.6% in accuracy and 18$\times$reduction in forward passes compared to MeZO, achieving convergence speeds comparable to Adam. We also provide theoretical analysis proving FZOO’s formal equivalence to a normalized-SGD update rule and establishing its convergence guarantees. Beyond full-parameter tuning, FZOO plugs smoothly into PEFT techniques, unlocking even larger memory savings. Taken together, our results make single-GPU, high-speed, full-parameter fine-tuning realistic today and point toward future work on memory-efficient pre-training. Code: https://github.com/DKmiyan/FZOO
Cite
Text
Dang et al. "FZOO: Fast Zeroth-Order Optimizer for Fine‑Tuning Large Language Models Towards Adam‑Scale Speed." International Conference on Learning Representations, 2026.Markdown
[Dang et al. "FZOO: Fast Zeroth-Order Optimizer for Fine‑Tuning Large Language Models Towards Adam‑Scale Speed." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/dang2026iclr-fzoo/)BibTeX
@inproceedings{dang2026iclr-fzoo,
title = {{FZOO: Fast Zeroth-Order Optimizer for Fine‑Tuning Large Language Models Towards Adam‑Scale Speed}},
author = {Dang, Sizhe and yangyangGuo, and Zhao, Yanjun and Zheng, Xiaodong and Dai, Guang and Tsang, Ivor and Ye, Haishan},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/dang2026iclr-fzoo/}
}