The Art of Scaling Reinforcement Learning Compute for LLMs

Devvrit, Fnu; Madaan, Lovish; Tiwari, Rishabh; Bansal, Rachit; Duvvuri, Sai Surya; Zaheer, Manzil; Dhillon, Inderjit S; Brandfonbrener, David; Agarwal, Rishabh

The Art of Scaling Reinforcement Learning Compute for LLMs

Fnu Devvrit, Lovish Madaan, Rishabh Tiwari, Rachit Bansal, Sai Surya Duvvuri, Manzil Zaheer, Inderjit S Dhillon, David Brandfonbrener, Rishabh Agarwal

ICLR 2026

/iclr/2026/devvrit2026iclr-art/

Abstract

Reinforcement learning (RL) has become central to training large language models (LLMs), yet the field lacks predictive scaling methodologies comparable to those established for pre-training. Despite rapidly rising compute budgets, there is no principled understanding of how to evaluate algorithmic improvements for scaling RL compute. We present the first large-scale systematic study, amounting to more than 400,000 GPU-hours, that defines a principled framework for analyzing and predicting RL scaling in LLMs. We fit sigmoidal compute-performance curves for RL training and ablate a wide range of common design choices to analyze their effects on asymptotic performance and compute efficiency. We observe: (1) Not all recipes yield similar asymptotic performance, Details such as loss aggregation, normalization, curriculum, and off-policy algorithm primarily modulate compute efficiency without materially shifting the asymptote, and (3) Stable, scalable recipes follow predictable scaling trajectories, enabling extrapolation from smaller-scale runs. Combining these insights, we propose a _best-practice_ recipe, ScaleRL, and demonstrate its effectiveness by successfully scaling and predicting validation performance on a single RL run scaled up to 100,000 GPU-hours. Our work provides both a _scientific framework_ for analyzing scaling in RL and a practical recipe that brings RL training closer to the predictability long achieved in pre-training.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Devvrit et al. "The Art of Scaling Reinforcement Learning Compute for LLMs." International Conference on Learning Representations, 2026.

Markdown

[Devvrit et al. "The Art of Scaling Reinforcement Learning Compute for LLMs." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/devvrit2026iclr-art/)

BibTeX

@inproceedings{devvrit2026iclr-art,
  title     = {{The Art of Scaling Reinforcement Learning Compute for LLMs}},
  author    = {Devvrit, Fnu and Madaan, Lovish and Tiwari, Rishabh and Bansal, Rachit and Duvvuri, Sai Surya and Zaheer, Manzil and Dhillon, Inderjit S and Brandfonbrener, David and Agarwal, Rishabh},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/devvrit2026iclr-art/}
}