WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning

Li, Kuan; Zhang, Zhongwang; Yin, Huifeng; Ye, Rui; Zhao, Yida; Zhang, Liwen; Ou, Litu; Zhang, Ding-Chu; Wu, Xixi; Yu, Xinmiao; Wu, Jialong; Wang, Xinyu; Qiao, Zile; Zhang, Zhen; Jiang, Yong; Xie, Pengjun; Huang, Fei; Xu, Zhi-Qin John; Wang, Shuai; Cheng, Minhao; Zhou, Jingren

WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning

ICLR 2026

/iclr/2026/li2026iclr-websailorv2/

Abstract

To significantly advance the capabilities of open-source web agents, we present WebSailor-V2, a complete post-training pipeline encompassing data construction, Supervised Fine-Tuning (SFT), and Reinforcement Learning (RL). Our methodology features two key innovations: (1) On the data front, we developed SailorFog-QA-2, a novel dataset built from a densely interconnected knowledge graph that introduces a wide variety of uncertainties beyond simple obfuscation, fostering more sophisticated reasoning. (2) For training, we engineered a dual-environment RL framework, combining a high-fidelity simulator for rapid, low-cost algorithmic iteration with a robust, managed real-world environment for stable final policy training, all integrated within a symbiotic data-policy feedback loop. Trained on the Qwen3-30B-A3B model, WebSailor-V2 achieves state-of-the-art results, scoring 35.3 on BrowseComp-EN, 44.1 on BrowseComp-ZH, and 30.6 on Humanity's Last Exam (HLE). Notably, our 30B-A3B MOE agent significantly outperforms all existing open-source agents and surpasses even the 671B DeepSeek-V3.1, demonstrating performance competitive with leading proprietary systems.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Li et al. "WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning." International Conference on Learning Representations, 2026.

Markdown

[Li et al. "WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/li2026iclr-websailorv2/)

BibTeX

@inproceedings{li2026iclr-websailorv2,
  title     = {{WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning}},
  author    = {Li, Kuan and Zhang, Zhongwang and Yin, Huifeng and Ye, Rui and Zhao, Yida and Zhang, Liwen and Ou, Litu and Zhang, Ding-Chu and Wu, Xixi and Yu, Xinmiao and Wu, Jialong and Wang, Xinyu and Qiao, Zile and Zhang, Zhen and Jiang, Yong and Xie, Pengjun and Huang, Fei and Xu, Zhi-Qin John and Wang, Shuai and Cheng, Minhao and Zhou, Jingren},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/li2026iclr-websailorv2/}
}