OR-PRM: A Process Reward Model for Algorithmic Problem in Operations Research

Wang, Yilin; Zhou, Heng; Mao, Dongxing; Li, Linjie; Tan, Jingru; Han, Haochen; Yang, Zhengyuan; Wang, Alex Jinpeng; Li, Min

OR-PRM: A Process Reward Model for Algorithmic Problem in Operations Research

Yilin Wang, Heng Zhou, Dongxing Mao, Linjie Li, Jingru Tan, Haochen Han, Zhengyuan Yang, Alex Jinpeng Wang, Min Li

ICLR 2026

/iclr/2026/wang2026iclr-orprm/

Abstract

Large language models (LLMs) with Process Reward Models (PRMs) have shown strong reasoning ability, yet their potential in Operations Research (OR) remains unexplored. We present the first PRM tailored for OR, but find that directly training on mainstream datasets yields surprisingly weak performance. To understand this gap, we conduct a systematic analysis and identify the primary bottleneck: the datasets themselves, where over 30\% of annotations are severely flawed. To overcome these limitations, we first collect all existing synthetic datasets and apply a carefully designed filtering pipeline to construct a high-quality seed dataset. Building upon this seed, we then build OR-ProcessQA, the first large-scale dataset for OR with step-by-step supervision, where diverse solution pathways are generated via Monte Carlo Tree Search (MCTS) and each step is validated for logical consistency by GPT-4o. Building on this foundation, we train OR-PRM, the first Process Reward Model in the OR domain, designed to evaluate and guide reasoning at every step rather than only the final outcome. Together, these advances enable OR-PRM to substantially improve LLMs’ reasoning capability, achieving a maximum absolute improvement of 12.5\% over the base model in Best-of-N settings, and highlighting the power of process-oriented supervision for reliable problem solving in operations research.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Wang et al. "OR-PRM: A Process Reward Model for Algorithmic Problem in Operations Research." International Conference on Learning Representations, 2026.

Markdown

[Wang et al. "OR-PRM: A Process Reward Model for Algorithmic Problem in Operations Research." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/wang2026iclr-orprm/)

BibTeX

@inproceedings{wang2026iclr-orprm,
  title     = {{OR-PRM: A Process Reward Model for Algorithmic Problem in Operations Research}},
  author    = {Wang, Yilin and Zhou, Heng and Mao, Dongxing and Li, Linjie and Tan, Jingru and Han, Haochen and Yang, Zhengyuan and Wang, Alex Jinpeng and Li, Min},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/wang2026iclr-orprm/}
}