AdaptiveStep: Automatically Dividing Reasoning Step Through Model Confidence

Abstract

Current approaches for training Process Reward Models (PRMs) often involve deconposing responses into multiple reasoning steps using rule-based techniques, such as using predefined placeholder tokens or setting the reasoning step’s length to a fixed size. These approaches overlook the fact that certain words don’t usually indicate true decision points. To address this, we propose AdaptiveStep, a method that divides reasoning steps based on the model’s confidence in predicting the next word, offering more information on decision-making at each step, improving downstream tasks like reward model training. Moreover, our method requires no manual annotation. Experiments with AdaptiveStep-trained PRMs in mathematical reasoning and code generation show that the outcome PRM achieves state-of-the-art Best-of-N performance, surpassing greedy search strategy with token-level value-guided decoding, while also reducing construction costs by over 30% compared to existing open-source PRMs. We also provide a thorough analysis and case study on its performance, transferability, and generalization capabilities. We provide our code on https://github.com/Lux0926/ASPRM.

Cite

Text

Liu et al. "AdaptiveStep: Automatically Dividing Reasoning Step Through Model Confidence." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Liu et al. "AdaptiveStep: Automatically Dividing Reasoning Step Through Model Confidence." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/liu2025icml-adaptivestep/)

BibTeX

@inproceedings{liu2025icml-adaptivestep,
  title     = {{AdaptiveStep: Automatically Dividing Reasoning Step Through Model Confidence}},
  author    = {Liu, Yuliang and Lu, Junjie and Qu, Chaofeng and Chen, Zhaoling and Cai, Zefan and Liu, Jason Klein and Liu, Chonghan and Xia, Yunhui and Zhao, Li and Bian, Jiang and Zhang, Chuheng and Shen, Wei and Lin, Zhouhan},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {39016-39031},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/liu2025icml-adaptivestep/}
}