How Do Large Language Monkeys Get Their Power (Laws)?
Abstract
Recent research across mathematical problem solving, proof assistant programming and multimodal jailbreaking documents a striking finding: when (multimodal) language model tackle a suite of tasks with multiple attempts per task – succeeding if any attempt is correct – then the negative log of the average success rate scales a power law in the number of attempts. In this work, we identify an apparent puzzle: a simple mathematical calculation predicts that on each problem, the failure rate should fall exponentially with the number of attempts. We confirm this prediction empirically, raising a question: from where does aggregate polynomial scaling emerge? We then answer this question by demonstrating per-problem exponential scaling can be made consistent with aggregate polynomial scaling if the distribution of single-attempt success probabilities is heavy tailed such that a small fraction of tasks with extremely low success probabilities collectively warp the aggregate success trend into a power law - even as each problem scales exponentially on its own. We further demonstrate that this distributional perspective explains previously observed deviations from power law scaling, and provides a simple method for forecasting the power law exponent with an order of magnitude lower relative error, or equivalently, ${\sim}2-4$ orders of magnitude less inference compute. Overall, our work contributes to a better understanding of how neural language model performance improves with scaling inference compute and the development of scaling-predictable evaluations of (multimodal) language models.
Cite
Text
Schaeffer et al. "How Do Large Language Monkeys Get Their Power (Laws)?." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Schaeffer et al. "How Do Large Language Monkeys Get Their Power (Laws)?." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/schaeffer2025icml-large/)BibTeX
@inproceedings{schaeffer2025icml-large,
title = {{How Do Large Language Monkeys Get Their Power (Laws)?}},
author = {Schaeffer, Rylan and Kazdan, Joshua and Hughes, John and Juravsky, Jordan and Price, Sara and Lynch, Aengus and Jones, Erik and Kirk, Robert and Mirhoseini, Azalia and Koyejo, Sanmi},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {53132-53176},
volume = {267},
url = {https://mlanthology.org/icml/2025/schaeffer2025icml-large/}
}