CRAG - Comprehensive RAG Benchmark

Yang, Xiao; Sun, Kai; Xin, Hao; Sun, Yushi; Bhalla, Nikita; Chen, Xiangsen; Choudhary, Sajal; Gui, Rongze Daniel; Jiang, Ziran Will; Jiang, Ziyu; Kong, Lingkun; Moran, Brian; Wang, Jiaqi; Xu, Yifan Ethan; Yan, An; Yang, Chenyu; Yuan, Eting; Zha, Hanwen; Tang, Nan; Chen, Lei; Scheffer, Nicolas; Liu, Yue; Shah, Nirav; Wanga, Rakesh; Kumar, Anuj; Yih, Wen-tau; Dong, Xin Luna

doi:10.52202/079017-0335

CRAG - Comprehensive RAG Benchmark

NeurIPS 2024

doi:10.52202/079017-0335 /neurips/2024/yang2024neurips-crag/

Abstract

Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)’s deficiency in lack of knowledge. Existing RAG datasets, however, do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering benchmark of 4,409 question-answer pairs and mock APIs to simulate web and Knowledge Graph (KG) search. CRAG is designed to encapsulate a diverse array of questions across five domains and eight question categories, reflecting varied entity popularity from popular to long-tail, and temporal dynamisms ranging from years to seconds. Our evaluation on this benchmark highlights the gap to fully trustworthy QA. Whereas most advanced LLMs achieve $\le 34\%$ accuracy on CRAG, adding RAG in a straightforward manner improves the accuracy only to 44%. State-of-the-art industry RAG solutions only answer 63% questions without any hallucination. CRAG also reveals much lower accuracy in answering questions regarding facts with higher dynamism, lower popularity, or higher complexity, suggesting future research directions. The CRAG benchmark laid the groundwork for a KDD Cup 2024 challenge, attracted thousands of participants and submissions. We commit to maintaining CRAG to serve research communities in advancing RAG solutions and general QA solutions. CRAG is available at https://github.com/facebookresearch/CRAG/.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Yang et al. "CRAG - Comprehensive RAG Benchmark." Neural Information Processing Systems, 2024. doi:10.52202/079017-0335

Markdown

[Yang et al. "CRAG - Comprehensive RAG Benchmark." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/yang2024neurips-crag/) doi:10.52202/079017-0335

BibTeX

@inproceedings{yang2024neurips-crag,
  title     = {{CRAG - Comprehensive RAG Benchmark}},
  author    = {Yang, Xiao and Sun, Kai and Xin, Hao and Sun, Yushi and Bhalla, Nikita and Chen, Xiangsen and Choudhary, Sajal and Gui, Rongze Daniel and Jiang, Ziran Will and Jiang, Ziyu and Kong, Lingkun and Moran, Brian and Wang, Jiaqi and Xu, Yifan Ethan and Yan, An and Yang, Chenyu and Yuan, Eting and Zha, Hanwen and Tang, Nan and Chen, Lei and Scheffer, Nicolas and Liu, Yue and Shah, Nirav and Wanga, Rakesh and Kumar, Anuj and Yih, Wen-tau and Dong, Xin Luna},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-0335},
  url       = {https://mlanthology.org/neurips/2024/yang2024neurips-crag/}
}