DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents

Du, Mingxuan; Xu, Benfeng; Zhu, Chiwei; Zhang, Licheng; Wang, Xiaorui; Mao, Zhendong

DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents

Mingxuan Du, Benfeng Xu, Chiwei Zhu, Licheng Zhang, Xiaorui Wang, Zhendong Mao

ICLR 2026

/iclr/2026/du2026iclr-deepresearch/

Abstract

Deep Research Agents (DRAs) are emerging as one of the most practical classes of LLM-based agents. Given an open-ended research task, they find, analyze, and synthesize large numbers of online sources to produce a comprehensive report at the level of a research analyst. This can compress hours of manual desk research into minutes. However, a comprehensive benchmark for systematically evaluating the capabilities of these agents remains absent. To bridge this gap, we introduce DeepResearch Bench, a benchmark consisting of 100 PhD-level research tasks, each meticulously crafted by domain experts across 22 distinct fields. To evaluate DRAs comprehensively, we propose two complementary and fully automated methodologies. The first is a reference-based method with adaptive criteria to assess the quality of generated research reports. The second evaluates a DRA’s information‑retrieval and collection capabilities by assessing its effective citation count and overall citation accuracy. By conducting extensive human consistency experiments, we demonstrate that our evaluation methods are highly aligned with expert judges and faithfully reflect human judgments of quality differences among DRA-generated content. We are open-sourcing DeepResearch Bench and key components of these frameworks to accelerate the development of practical LLM-based agents.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Du et al. "DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents." International Conference on Learning Representations, 2026.

Markdown

[Du et al. "DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/du2026iclr-deepresearch/)

BibTeX

@inproceedings{du2026iclr-deepresearch,
  title     = {{DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents}},
  author    = {Du, Mingxuan and Xu, Benfeng and Zhu, Chiwei and Zhang, Licheng and Wang, Xiaorui and Mao, Zhendong},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/du2026iclr-deepresearch/}
}