TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Wu, Xianjie; Yang, Jian; Chai, Linzheng; Zhang, Ge; Liu, Jiaheng; Du, Xeron; Liang, Di; Shu, Daixin; Cheng, Xianfu; Sun, Tianzhen; Li, Tongliang; Li, Zhoujun; Niu, Guanglin

doi:10.1609/AAAI.V39I24.34739

TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Xianjie Wu, Jian Yang, Linzheng Chai, Ge Zhang, Jiaheng Liu, Xeron Du, Di Liang, Daixin Shu, Xianfu Cheng, Tianzhen Sun, Tongliang Li, Zhoujun Li, Guanglin Niu

AAAI 2025 pp. 25497-25506

doi:10.1609/AAAI.V39I24.34739 /aaai/2025/wu2025aaai-tablebench/

Abstract

Recent advancements in Large Language Models (LLMs) have markedly enhanced the interpretation and processing of tabular data, introducing previously unimaginable capabilities. Despite these achievements, LLMs still encounter significant challenges when applied in industrial scenarios, particularly due to the increased complexity of reasoning required with real-world tabular data, underscoring a notable disparity between academic benchmarks and practical applications. To address this discrepancy, we conduct a detailed investigation into the application of tabular data in industrial scenarios and propose a comprehensive and complex benchmark TableBench, including 18 fields within four major categories of table question answering (TableQA) capabilities. Furthermore, we introduce TableLLM, trained on our meticulously constructed training set TableInstruct, achieving comparable performance with GPT-3.5. Massive experiments conducted on TableBench indicate that both open-source and proprietary LLMs still have significant room for improvement to meet real-world demands, where the most advanced model, GPT-4, achieves only a modest score compared to humans.

PDF AAAI Semantic Scholar

Cite

Text

Wu et al. "TableBench: A Comprehensive and Complex Benchmark for Table Question Answering." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I24.34739

Markdown

[Wu et al. "TableBench: A Comprehensive and Complex Benchmark for Table Question Answering." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/wu2025aaai-tablebench/) doi:10.1609/AAAI.V39I24.34739

BibTeX

@inproceedings{wu2025aaai-tablebench,
  title     = {{TableBench: A Comprehensive and Complex Benchmark for Table Question Answering}},
  author    = {Wu, Xianjie and Yang, Jian and Chai, Linzheng and Zhang, Ge and Liu, Jiaheng and Du, Xeron and Liang, Di and Shu, Daixin and Cheng, Xianfu and Sun, Tianzhen and Li, Tongliang and Li, Zhoujun and Niu, Guanglin},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {25497-25506},
  doi       = {10.1609/AAAI.V39I24.34739},
  url       = {https://mlanthology.org/aaai/2025/wu2025aaai-tablebench/}
}