From Weight-Based to State-Based Fine-Tuning: Further Memory Reduction on LoRA with Parallel Control

Zhang, Chi; Lianhai, Ren; Cheng, Jingpu; Li, Qianxiao

From Weight-Based to State-Based Fine-Tuning: Further Memory Reduction on LoRA with Parallel Control

Chi Zhang, Ren Lianhai, Jingpu Cheng, Qianxiao Li

ICML 2025 pp. 75453-75467

/icml/2025/zhang2025icml-weightbased/

Abstract

The LoRA method has achieved notable success in reducing GPU memory usage by applying low-rank updates to weight matrices. Yet, one simple question remains: can we push this reduction even further? Furthermore, is it possible to achieve this while improving performance and reducing computation time? Answering these questions requires moving beyond the conventional weight-centric approach. In this paper, we present a state-based fine-tuning framework that shifts the focus from weight adaptation to optimizing forward states, with LoRA acting as a special example. Specifically, state-based tuning introduces parameterized perturbations to the states within the computational graph, allowing us to control states across an entire residual block. A key advantage of this approach is the potential to avoid storing large intermediate states in models like transformers. Empirical results across multiple architectures—including ViT, RoBERTa, LLaMA2-7B, and LLaMA3-8B—show that our method further reduces memory consumption and computation time while simultaneously improving performance. Moreover, as a result of memory reduction, we explore the feasibility to train 7B/8B models on consumer-level GPUs like Nvidia 3090, without model quantization. The code is available at an anonymous GitHub repository

PDF ICML OpenReview Semantic Scholar

Cite

Text

Zhang et al. "From Weight-Based to State-Based Fine-Tuning: Further Memory Reduction on LoRA with Parallel Control." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Zhang et al. "From Weight-Based to State-Based Fine-Tuning: Further Memory Reduction on LoRA with Parallel Control." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/zhang2025icml-weightbased/)

BibTeX

@inproceedings{zhang2025icml-weightbased,
  title     = {{From Weight-Based to State-Based Fine-Tuning: Further Memory Reduction on LoRA with Parallel Control}},
  author    = {Zhang, Chi and Lianhai, Ren and Cheng, Jingpu and Li, Qianxiao},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {75453-75467},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/zhang2025icml-weightbased/}
}