TableRAG: Million-Token Table Understanding with Language Models

Abstract

Recent advancements in language models (LMs) have notably enhanced their ability to reason with tabular data, primarily through program-aided mechanisms that manipulate and analyze tables.However, these methods often require the entire table as input, leading to scalability challenges due to the positional bias or context length constraints.In response to these challenges, we introduce TableRAG, a Retrieval-Augmented Generation (RAG) framework specifically designed for LM-based table understanding.TableRAG leverages query expansion combined with schema and cell retrieval to pinpoint crucial information before providing it to the LMs.This enables more efficient data encoding and precise retrieval, significantly reducing prompt lengths and mitigating information loss.We have developed two new million-token benchmarks from the Arcade and BIRD-SQL datasets to thoroughly evaluate TableRAG's effectiveness at scale.Our results demonstrate that TableRAG's retrieval design achieves the highest retrieval quality, leading to the new state-of-the-art performance on large-scale table understanding.

Cite

Text

Chen et al. "TableRAG: Million-Token Table Understanding with Language Models." Neural Information Processing Systems, 2024. doi:10.52202/079017-2382

Markdown

[Chen et al. "TableRAG: Million-Token Table Understanding with Language Models." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/chen2024neurips-tablerag/) doi:10.52202/079017-2382

BibTeX

@inproceedings{chen2024neurips-tablerag,
  title     = {{TableRAG: Million-Token Table Understanding with Language Models}},
  author    = {Chen, Si-An and Miculicich, Lesly and Eisenschlos, Julian Martin and Wang, Zifeng and Wang, Zilong and Chen, Yanfei and Fujii, Yasuhisa and Lin, Hsuan-Tien and Lee, Chen-Yu and Pfister, Tomas},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-2382},
  url       = {https://mlanthology.org/neurips/2024/chen2024neurips-tablerag/}
}