Process vs. Outcome Reward: Which Is Better for Agentic RAG Reinforcement Learning

Abstract

Retrieval-augmented generation (RAG) enhances large language models (LLMs) by integrating external knowledge, yet traditional RAG systems struggle with static workflows and limited adaptability for complex, multistep reasoning tasks. Agentic RAG systems, such as DeepResearch, address these issues through dynamic retrieval, iterative context refinement, and adaptive workflows. However, recent methods like Search-R1, which rely on outcome-based reinforcement learning, face challenges such as low exploration efficiency, gradient conflict, and sparse reward signals. To tackle these limitations, we introduce ReasonRAG, a novel method that leverages RAG-ProGUIDE—a high-quality dataset providing fine-grained, process-level rewards for query generation, evidence extraction, and answer generation. By employing process-supervised reinforcement learning, ReasonRAG enhances LLMs’ autonomous capabilities in search, query generation, evidence extraction, and answer synthesis. Experimental results show that ReasonRAG, utilizing RAG-ProGUIDE, outperforms existing approaches like Search-R1 and traditional RAG systems, achieving superior performance on five benchmark datasets with only 5k training instances—significantly fewer than the 90k required by Search-R1. Our code is available at https://github.com/Applied-Machine-Learning-Lab/ReasonRAG.

Cite

Text

Zhang et al. "Process vs. Outcome Reward: Which Is Better for Agentic RAG Reinforcement Learning." Advances in Neural Information Processing Systems, 2025.

Markdown

[Zhang et al. "Process vs. Outcome Reward: Which Is Better for Agentic RAG Reinforcement Learning." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/zhang2025neurips-process/)

BibTeX

@inproceedings{zhang2025neurips-process,
  title     = {{Process vs. Outcome Reward: Which Is Better for Agentic RAG Reinforcement Learning}},
  author    = {Zhang, Wenlin and Li, Xiangyang and Dong, Kuicai and Wang, Yichao and Jia, Pengyue and Li, Xiaopeng and Zhang, Yingyi and Xu, Derong and Du, Zhaocheng and Guo, Huifeng and Tang, Ruiming and Zhao, Xiangyu},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/zhang2025neurips-process/}
}