Predicting LLM Reasoning Performance with Small Proxy Model
Abstract
Given the prohibitive cost of pre-training large language models, it is essential to leverage smaller proxy models to optimize recipes before scaling up. However, this approach becomes challenging for reasoning capabilities, which exhibit \textit{emergent} behavior that only appears reliably at larger model sizes, often exceeding 7B parameters. To address this, we introduce \tsc{rBridge}, showing that small proxies ($\leq$1B) can effectively predict large-model reasoning by aligning more closely with \textbf{(1)} the pre-training objective and \textbf{(2)} the target task. \tsc{rBridge} achieves this by weighting negative log-likelihood with task alignment, using reasoning traces from frontier models as gold labels. In our experiments, \tsc{rBridge} \textbf{(i)} reduces dataset ranking costs by over 100$\times$ relative to the best baseline, \textbf{(ii)} achieves the strongest correlation across six reasoning benchmarks at 1B to 32B scale, and \textbf{(iii)} transfers predictive relationships across pre-training recipes at 1B to 7B scale. These findings indicate that \tsc{rBridge} offers a practical path for exploring reasoning-oriented pre-training at lower cost.
Cite
Text
Koh et al. "Predicting LLM Reasoning Performance with Small Proxy Model." International Conference on Learning Representations, 2026.Markdown
[Koh et al. "Predicting LLM Reasoning Performance with Small Proxy Model." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/koh2026iclr-predicting/)BibTeX
@inproceedings{koh2026iclr-predicting,
title = {{Predicting LLM Reasoning Performance with Small Proxy Model}},
author = {Koh, Woosung and Suk, Juyoung and Han, Sungjun and Yun, Se-Young and Shin, Jay},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/koh2026iclr-predicting/}
}