Rollout Roulette: A Probabilistic Inference Approach to Inference-Time Scaling of LLMs Using Particle-Based Monte Carlo Methods

Abstract

Large language models (LLMs) have achieved significant performance gains via scaling up model sizes and/or data. However, recent evidence suggests diminishing returns from such approaches, motivating a pivot to scaling test-time compute. Existing deterministic inference-time scaling methods, usually with reward models, cast the task as a search problem, but suffer from a key limitation: early pruning. Due to inherently imperfect reward models, promising trajectories may be discarded prematurely, leading to suboptimal performance. We propose a novel inference-time scaling approach by adapting particle-based Monte Carlo methods. Our method maintains a diverse set of candidates and robustly balances exploration and exploitation. Our empirical evaluation demonstrates that our particle filtering methods have a 4--16x better scaling rate over deterministic search counterparts on both various challenging mathematical and more general reasoning tasks. Using our approach, we show that Qwen2.5-Math-1.5B-Instruct surpasses GPT-4o accuracy in only 4 rollouts, while Qwen2.5-Math-7B-Instruct scales to o1 level accuracy in only 32 rollouts. Our work not only presents an effective method to inference-time scaling, but also connects rich literature in probabilistic inference with inference-time scaling of LLMs to develop more robust algorithms in future work.

Cite

Text

Puri et al. "Rollout Roulette: A Probabilistic Inference Approach to Inference-Time Scaling of LLMs Using Particle-Based Monte Carlo Methods." Advances in Neural Information Processing Systems, 2025.

Markdown

[Puri et al. "Rollout Roulette: A Probabilistic Inference Approach to Inference-Time Scaling of LLMs Using Particle-Based Monte Carlo Methods." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/puri2025neurips-rollout/)

BibTeX

@inproceedings{puri2025neurips-rollout,
  title     = {{Rollout Roulette: A Probabilistic Inference Approach to Inference-Time Scaling of LLMs Using Particle-Based Monte Carlo Methods}},
  author    = {Puri, Isha and Sudalairaj, Shivchander and Xu, Guangxuan and Bhandwaldar, Abhishek and Xu, Kai and Srivastava, Akash},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/puri2025neurips-rollout/}
}