QeRL: Beyond Efficiency - Quantization-Enhanced Reinforcement Learning for LLMs

Abstract

We propose QeRL, a Quantization-enhanced Reinforcement Learning framework for large language models (LLMs). While RL is essential for LLMs' reasoning capabilities, it is resource-intensive, requiring substantial GPU memory and long rollout duration. QeRL addresses these issues by combining NVFP4 quantization with Low-Rank Adaptation (LoRA), accelerating rollout phase of RL while reducing memory overhead. Beyond efficiency, our findings show that quantization noise increases policy entropy, enhancing exploration in LoRA-based RL, and enabling the discovery of better strategies during RL. To further optimize exploration, QeRL introduces an Adaptive Quantization Noise (AQN) mechanism, which dynamically adjusts noise throughout training. Experiments demonstrate that QeRL delivers over 1.5× speedup in the rollout phase compared to QLoRA, and around 1.3× speedup compared to BF16 LoRA in 7B model. Moreover, this is the first framework to enable RL training of a 32B LLM on a single H100 80GB GPU, while delivering overall speedups for RL training. It also achieves faster reward growth and higher final accuracy than 16-bit LoRA and QLoRA, while matching the performance of full-parameter fine-tuning on mathematical benchmarks such as GSM8K (90.8%) and MATH 500 (77.4%) in the 7B model. These results establish QeRL as an efficient and effective framework for RL training in LLMs.

Cite

Text

Huang et al. "QeRL: Beyond Efficiency - Quantization-Enhanced Reinforcement Learning for LLMs." International Conference on Learning Representations, 2026.

Markdown

[Huang et al. "QeRL: Beyond Efficiency - Quantization-Enhanced Reinforcement Learning for LLMs." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/huang2026iclr-qerl/)

BibTeX

@inproceedings{huang2026iclr-qerl,
  title     = {{QeRL: Beyond Efficiency - Quantization-Enhanced Reinforcement Learning for LLMs}},
  author    = {Huang, Wei and Ge, Yi and Yang, Shuai and Xiao, Yicheng and Mao, Huizi and Lin, Yujun and Ye, Hanrong and Liu, Sifei and Cheung, Ka Chun and Yin, Hongxu and Lu, Yao and Qi, Xiaojuan and Han, Song and Chen, Yukang},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/huang2026iclr-qerl/}
}