Three Forward, One Backward: Memory-Efficient Full-Rank Fine-Tuning of Large Models via Extra Forward Passes

Zhang, Jia; Bai, Yu; Zhang, Hualin; Chen, Tianshuo; Liu, Zhaogeng; Xu, Zhiqiang; Chang, Yi; Gu, Bin

Three Forward, One Backward: Memory-Efficient Full-Rank Fine-Tuning of Large Models via Extra Forward Passes

Jia Zhang, Yu Bai, Hualin Zhang, Tianshuo Chen, Zhaogeng Liu, Zhiqiang Xu, Yi Chang, Bin Gu

ICLR 2026

/iclr/2026/zhang2026iclr-three/

Abstract

Fine-tuning large language models (LLMs) has achieved significant success in downstream tasks. However, as the model size continues to grow, traditional fine-tuning methods have become increasingly impractical due to their high computational and memory costs. This has motivated researchers to explore parameter-efficient and memory-friendly fine-tuning strategies to enable scalable approaches, with Low-Rank Adaptation (LoRA) standing out as a representative work. However, the LoRA update is restricted to a low-rank subspace, which results in suboptimal performance compared to the full-parameter update. Recent research has also explored memory-efficient fine-tuning LLMs using just forward passes while suffer from high variance in gradient estimation and low convergence speed. To address the issues above, we propose a new alternating optimization framework called LMAO (**L**ow-rank and **M**emory-efficient Zeroth-Order **A**lternating **O**ptimization), which combines the advantages of LoRA and MeZO. This method alternately updates the low-rank components and zeroth-order directions during training. By performing three forward propagations and one backward propagation, each update is full-rank, thereby reducing feature loss and enabling efficient fine-tuning under strict memory constraints. We provide theoretical guarantees on the convergence and convergence rate of this method. Empirical results demonstrate that, in experiments on multiple models (e.g., OPT, RoBERTa-large), LMAO achieves performance comparable to first-order methods. This presents a practical and scalable solution for fine-tuning large-scale models. Our source code is available at [https://github.com/workelaina/LMAO](https://github.com/workelaina/LMAO).

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Zhang et al. "Three Forward, One Backward: Memory-Efficient Full-Rank Fine-Tuning of Large Models via Extra Forward Passes." International Conference on Learning Representations, 2026.

Markdown

[Zhang et al. "Three Forward, One Backward: Memory-Efficient Full-Rank Fine-Tuning of Large Models via Extra Forward Passes." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zhang2026iclr-three/)

BibTeX

@inproceedings{zhang2026iclr-three,
  title     = {{Three Forward, One Backward: Memory-Efficient Full-Rank Fine-Tuning of Large Models via Extra Forward Passes}},
  author    = {Zhang, Jia and Bai, Yu and Zhang, Hualin and Chen, Tianshuo and Liu, Zhaogeng and Xu, Zhiqiang and Chang, Yi and Gu, Bin},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/zhang2026iclr-three/}
}