rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Abstract

We present rStar-Math to demonstrate that small language models (SLMs) can rival or even surpass the math reasoning capability of OpenAI o1, without distillation from superior models. rStar-Math achieves this by exercising “deep thinking” through Monte Carlo Tree Search (MCTS), where a math policy SLM performs test-time search guided by an SLM-based process reward model. rStar-Math introduces three innovations to tackle the challenges in training the two SLMs: (1) a novel code-augmented CoT data synthesis method, which performs extensive MCTS rollouts to generate step-by-step verified reasoning trajectories used to train the policy SLM; (2) a novel process reward model training method that avoids naïve step-level score annotation, yielding a more effective process preference model (PPM); (3) a self-evolution recipe in which the policy SLM and PPM are built from scratch and iteratively evolved to improve reasoning capabilities. Through 4 rounds of self-evolution with millions of synthesized solutions for 747k math problems, rStar-Math boosts SLMs’ math reasoning to state-of-the-art levels. On MATH benchmark, it improves Qwen2.5-Math-7B from 58.8% to 90.0%, surpassing o1-preview by +4.5%. On the USA Math Olympiad (AIME), rStar-Math solves an average of 53.3% (8/15) of problems, ranking among the top 20% of the brightest high school math students. Code and data are available at https://github.com/microsoft/rStar.

Cite

Text

Guan et al. "rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Guan et al. "rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/guan2025icml-rstarmath/)

BibTeX

@inproceedings{guan2025icml-rstarmath,
  title     = {{rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking}},
  author    = {Guan, Xinyu and Zhang, Li Lyna and Liu, Yifei and Shang, Ning and Sun, Youran and Zhu, Yi and Yang, Fan and Yang, Mao},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {20640-20661},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/guan2025icml-rstarmath/}
}