Progress Reward Model for Reinforcement Learning via Large Language Models

Xiuhui Zhang, Ning Gao, Xingyu Jiang, Yihui Chen, Yuheng Pan, Mohan Zhang, Yue Deng

NeurIPS 2025

/neurips/2025/zhang2025neurips-progress/

Abstract

Traditional reinforcement learning (RL) algorithms face significant limitations in handling long-term tasks with sparse rewards. Recent advancements have leveraged large language models (LLMs) to enhance RL by utilizing their world knowledge for task planning and reward generation. However, planning-based approaches often depend on pre-defined skill libraries and fail to optimize low-level control policies, while reward-based methods require extensive human feedback or exhaustive searching due to the complexity of tasks. In this paper, we propose the Progress Reward Model for RL (PRM4RL), a novel framework that integrates task planning and dense reward to enhance RL. For high-level planning, a complex task is decomposed into a series of simple manageable subtasks, with a subtask-oriented, fine-grained progress function designed to monitor task execution progress. For low-level reward generation, inspired by potential-based reward shaping, we use the progress function to construct a Progress Reward Model (PRM), providing theoretically grounded optimality and convergence guarantees, thereby enabling effective policy optimization. Experimental results on robotics control tasks demonstrate that our approach outperforms both LLM-based planning and reward methods, achieving state-of-the-art performance.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Zhang et al. "Progress Reward Model for Reinforcement Learning via Large Language Models." Advances in Neural Information Processing Systems, 2025.

Markdown

[Zhang et al. "Progress Reward Model for Reinforcement Learning via Large Language Models." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/zhang2025neurips-progress/)

BibTeX

@inproceedings{zhang2025neurips-progress,
  title     = {{Progress Reward Model for Reinforcement Learning via Large Language Models}},
  author    = {Zhang, Xiuhui and Gao, Ning and Jiang, Xingyu and Chen, Yihui and Pan, Yuheng and Zhang, Mohan and Deng, Yue},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/zhang2025neurips-progress/}
}