Scaling Test-Time Compute Without Verification or RL Is Suboptimal

Amrith Setlur, Nived Rajaraman, Sergey Levine, Aviral Kumar

ICML 2025 pp. 54058-54094

/icml/2025/setlur2025icml-scaling/

Abstract

Despite substantial advances in scaling test-time compute, an ongoing debate in the community is how it should be scaled up to enable continued and efficient improvements with scaling. There are largely two approaches: (i) distilling successful search or thinking traces; and (ii), using verification (e.g., 0/1 outcome rewards, or verifiers) to guide reinforcement learning (RL) and search algorithms. In this paper, we prove that finetuning LLMs with verifier-based (VB) methods based on RL or search is far superior to verifier-free (VF) approaches based on distilling or cloning search traces, given a fixed amount of compute/data budget. Further, we show that as we scale test-time compute (measured as the output token length) and training data, suboptimality of VF methods scales poorly compared to VB when the base pre-trained LLM presents a heterogeneous distribution over correct solution traces (e.g., different lengths, styles, etc.) and admits a non-sharp distribution over rewards on traces sampled from it. We formalize this condition using anti-concentration [Erdős 1945], implying a stronger result that VB methods scale better asymptotically, with the performance gap between VB and VF widening as test-time budget grows. We corroborate our theory empirically on didactic and math reasoning problems with 3/8/32B-sized pre-trained LLMs, where we find verification is crucial for scaling test-time compute.

PDF ICML OpenReview Semantic Scholar

Cite

Text

Setlur et al. "Scaling Test-Time Compute Without Verification or RL Is Suboptimal." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Setlur et al. "Scaling Test-Time Compute Without Verification or RL Is Suboptimal." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/setlur2025icml-scaling/)

BibTeX

@inproceedings{setlur2025icml-scaling,
  title     = {{Scaling Test-Time Compute Without Verification or RL Is Suboptimal}},
  author    = {Setlur, Amrith and Rajaraman, Nived and Levine, Sergey and Kumar, Aviral},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {54058-54094},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/setlur2025icml-scaling/}
}