Progress Reward Model for Reinforcement Learning via Large Language Models
Abstract
Traditional reinforcement learning (RL) algorithms face significant limitations in handling long-term tasks with sparse rewards. Recent advancements have leveraged large language models (LLMs) to enhance RL by utilizing their world knowledge for task planning and reward generation. However, planning-based approaches often depend on pre-defined skill libraries and fail to optimize low-level control policies, while reward-based methods require extensive human feedback or exhaustive searching due to the complexity of tasks. In this paper, we propose the Progress Reward Model for RL (PRM4RL), a novel framework that integrates task planning and dense reward to enhance RL. For high-level planning, a complex task is decomposed into a series of simple manageable subtasks, with a subtask-oriented, fine-grained progress function designed to monitor task execution progress. For low-level reward generation, inspired by potential-based reward shaping, we use the progress function to construct a Progress Reward Model (PRM), providing theoretically grounded optimality and convergence guarantees, thereby enabling effective policy optimization. Experimental results on robotics control tasks demonstrate that our approach outperforms both LLM-based planning and reward methods, achieving state-of-the-art performance.
Cite
Text
Zhang et al. "Progress Reward Model for Reinforcement Learning via Large Language Models." Advances in Neural Information Processing Systems, 2025.Markdown
[Zhang et al. "Progress Reward Model for Reinforcement Learning via Large Language Models." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/zhang2025neurips-progress/)BibTeX
@inproceedings{zhang2025neurips-progress,
title = {{Progress Reward Model for Reinforcement Learning via Large Language Models}},
author = {Zhang, Xiuhui and Gao, Ning and Jiang, Xingyu and Chen, Yihui and Pan, Yuheng and Zhang, Mohan and Deng, Yue},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/zhang2025neurips-progress/}
}