WebWalker: Benchmarking LLMs in Web Traversal
Abstract
Retrieval-augmented generation (RAG) demonstrates remarkable performance across tasks in open-domain question-answering. However, traditional search engines may retrieve shallow content, limiting the ability of LLMs to handle complex, multi-layered information. To address it, we introduce WebWalkerQA, a benchmark designed to assess the ability of LLMs to perform web traversal. It evaluates the capacity of LLMs to traverse a website's subpages to extract high-quality data systematically. We propose WebWalker, which is a multi-agent framework that mimics human-like web navigation through an explore-critic paradigm. Extensive experimental results show that WebWalkerQA is challenging and demonstrates the effectiveness of RAG combined with WebWalker, through the horizontal and vertical integration in real-world scenarios.
Cite
Text
Wu et al. "WebWalker: Benchmarking LLMs in Web Traversal." ICLR 2025 Workshops: LLM_Reason_and_Plan, 2025.Markdown
[Wu et al. "WebWalker: Benchmarking LLMs in Web Traversal." ICLR 2025 Workshops: LLM_Reason_and_Plan, 2025.](https://mlanthology.org/iclrw/2025/wu2025iclrw-webwalker/)BibTeX
@inproceedings{wu2025iclrw-webwalker,
title = {{WebWalker: Benchmarking LLMs in Web Traversal}},
author = {Wu, Jialong and Yin, Wenbiao and Jiang, Yong and Wang, Zhenglin and Xi, Zekun and Fang, Runnan and Zhang, Linhai and He, Yulan and Zhou, Deyu and Xie, Pengjun and Huang, Fei},
booktitle = {ICLR 2025 Workshops: LLM_Reason_and_Plan},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/wu2025iclrw-webwalker/}
}