HiPRAG: Hierarchical Process Rewards for Efficient Agentic Retrieval Augmented Generation

Wu, Peilin; Zhang, Mian; Wan, Kun; Zhao, Wentian; He, Kaiyu; Du, Xinya; Chen, Zhiyu

HiPRAG: Hierarchical Process Rewards for Efficient Agentic Retrieval Augmented Generation

Peilin Wu, Mian Zhang, Kun Wan, Wentian Zhao, Kaiyu He, Xinya Du, Zhiyu Chen

ICLR 2026

/iclr/2026/wu2026iclr-hiprag/

Abstract

Agentic Retrieval-Augmented Generation (RAG) is a powerful technique for incorporating external information that Large Language Models (LLMs) lack, enabling better problem solving and question answering. However, suboptimal search behaviors exist widely, such as over-search (retrieving information already known) and under-search (failing to search when necessary), which leads to unnecessary overhead and unreliable outputs. Current training methods, which typically rely on outcome-based rewards in a Reinforcement Learning (RL) framework, lack the fine-grained control needed to address these inefficiencies. To overcome this, we introduce $\textbf{Hi}$erarchical $\textbf{P}$rocess Rewards for Efficient agentic $\textbf{RAG}$ (HiPRAG), a novel training methodology that incorporates a fine-grained, knowledge-grounded process reward into the RL training. Our approach evaluates the necessity of each search decision on-the-fly by decomposing the agent's reasoning trajectory into discrete, parsable steps. We then apply a hierarchical reward function that provides an additional bonus based on the proportion of optimal search and non-search steps, on top of commonly used outcome and format rewards. Experiments on the Qwen2.5 and Llama-3.2 models across seven diverse QA benchmarks show that our method achieves average accuracies of 65.4\% (3B) and 67.2\% (7B), outperforming strong agentic RAG baselines. This is accomplished while dramatically improving search efficiency, reducing the over-search rate from over 27\% in baselines from previous work to just 2.3\% and concurrently lowering the under-search rate. These results demonstrate the efficacy of optimizing the reasoning process itself, not just the final outcome. Further experiments and analysis demonstrate that HiPRAG shows good generalizability across a wide range of RL algorithms, model families, sizes, and types. This work demonstrates the importance and potential of fine-grained control through RL, for improving the efficiency and optimality of reasoning for search agents.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Wu et al. "HiPRAG: Hierarchical Process Rewards for Efficient Agentic Retrieval Augmented Generation." International Conference on Learning Representations, 2026.

Markdown

[Wu et al. "HiPRAG: Hierarchical Process Rewards for Efficient Agentic Retrieval Augmented Generation." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/wu2026iclr-hiprag/)

BibTeX

@inproceedings{wu2026iclr-hiprag,
  title     = {{HiPRAG: Hierarchical Process Rewards for Efficient Agentic Retrieval Augmented Generation}},
  author    = {Wu, Peilin and Zhang, Mian and Wan, Kun and Zhao, Wentian and He, Kaiyu and Du, Xinya and Chen, Zhiyu},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/wu2026iclr-hiprag/}
}