WebSeer: Training Deeper Search Agents Through Reinforcement Learning with Self-Reflection

Abstract

Search agents have achieved significant advancements in enabling intelligent information retrieval and decision-making within interactive environments. Although reinforcement learning has been employed to train agentic models capable of more dynamic interactive retrieval, existing methods are limited by shallow tool-use depth and the accumulation of errors over multiple iterative interactions. In this paper, we present WebSeer, a more intelligent search agent trained via reinforcement learning enhanced with a self-reflection mechanism. Specifically, we construct a large dataset annotated with reflection patterns and design a two-stage training framework that unifies cold start and reinforcement learning within the self-reflection paradigm for real-world web-based environments, which enables the model to generate longer and more reflective tool-use trajectories. Our approach substantially extends tool-use chains and improves answer accuracy. Using a single 14B model, we achieve state-of-the-art results on HotpotQA and SimpleQA, with accuracies of 72.3\% and 90.0\%, respectively, and demonstrate strong generalization to out-of-distribution datasets.

Cite

Text

He et al. "WebSeer: Training Deeper Search Agents Through Reinforcement Learning with Self-Reflection." International Conference on Learning Representations, 2026.

Markdown

[He et al. "WebSeer: Training Deeper Search Agents Through Reinforcement Learning with Self-Reflection." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/he2026iclr-webseer/)

BibTeX

@inproceedings{he2026iclr-webseer,
  title     = {{WebSeer: Training Deeper Search Agents Through Reinforcement Learning with Self-Reflection}},
  author    = {He, Guanzhong and Yang, Zhen and Liu, Jinxin and Xu, Bin and Hou, Lei and Li, Juanzi},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/he2026iclr-webseer/}
}