Mode-Conditioning Unlocks Superior Test-Time Compute Scaling

Wu, Chen Henry; Goyal, Sachin; Raghunathan, Aditi

Mode-Conditioning Unlocks Superior Test-Time Compute Scaling

Chen Henry Wu, Sachin Goyal, Aditi Raghunathan

ICLR 2026

/iclr/2026/wu2026iclr-modeconditioning/

Abstract

Parallel sampling is essential to test-time scaling and reinforcement learning (RL), but its effectiveness is sharply limited by diversity collapse, where models concentrate on a few modes and repeated samples produce the same mistakes. We propose the mode-conditioning (ModC) framework, which explicitly allocates sampling compute across reasoning modes using either specialist models or mode-specific prefixes. With predefined mode labels, ModC consistently improves test-time scaling (Pass@k) across controlled graph-search tasks and math reasoning benchmarks, spanning model families and sizes from 0.5B to 7B. On OpenThoughts, fine-tuning Qwen2.5-7B with ModC achieves an 4× efficiency gain over standard training while also improving the maximum attainable Pass@k. We further show that gradient clustering enables ModC without predefined mode labels, yielding up to 10% gains on datasets such as NuminaMath. Finally, we show that ModC improves Pass@k after RL training and can further boost the Pass@k gains of diversity-inducing RL methods. These results demonstrate that standard training underutilizes the diversity in data, and that ModC provides a simple, effective remedy for unlocking the full benefits of diversity in parallel sampling.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Wu et al. "Mode-Conditioning Unlocks Superior Test-Time Compute Scaling." International Conference on Learning Representations, 2026.

Markdown

[Wu et al. "Mode-Conditioning Unlocks Superior Test-Time Compute Scaling." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/wu2026iclr-modeconditioning/)

BibTeX

@inproceedings{wu2026iclr-modeconditioning,
  title     = {{Mode-Conditioning Unlocks Superior Test-Time Compute Scaling}},
  author    = {Wu, Chen Henry and Goyal, Sachin and Raghunathan, Aditi},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/wu2026iclr-modeconditioning/}
}