InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks
Abstract
In this paper, we introduce InfiAgent-DABench, the first benchmark specifically designed to evaluate LLM-based agents on data analysis tasks. Agents need to solve these tasks end-to-end by interacting with an execution environment. This benchmark contains DAEval, a dataset consisting of 603 data analysis questions derived from 124 CSV files, and an agent framework which incorporates LLMs to serve as data analysis agents for both serving and evaluating. Since data analysis questions are often open-ended and hard to evaluate without human supervision, we adopt a format-prompting technique to convert each question into a closed-form format so that they can be automatically evaluated. Our extensive benchmarking of 34 LLMs uncovers the current challenges encountered in data analysis tasks. In addition, building upon our agent framework, we develop a specialized agent, DAAgent, which surpasses GPT-3.5 by 3.9% on DABench. Evaluation datasets and toolkits for InfiAgent-DABench are released at https://github.com/InfiAgent/InfiAgent.
Cite
Text
Hu et al. "InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks." International Conference on Machine Learning, 2024.Markdown
[Hu et al. "InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/hu2024icml-infiagentdabench/)BibTeX
@inproceedings{hu2024icml-infiagentdabench,
title = {{InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks}},
author = {Hu, Xueyu and Zhao, Ziyu and Wei, Shuang and Chai, Ziwei and Ma, Qianli and Wang, Guoyin and Wang, Xuwu and Su, Jing and Xu, Jingjing and Zhu, Ming and Cheng, Yao and Yuan, Jianbo and Li, Jiwei and Kuang, Kun and Yang, Yang and Yang, Hongxia and Wu, Fei},
booktitle = {International Conference on Machine Learning},
year = {2024},
pages = {19544-19572},
volume = {235},
url = {https://mlanthology.org/icml/2024/hu2024icml-infiagentdabench/}
}