Enhancing Table Recognition with Vision LLMs: A Benchmark and Neighbor-Guided Toolchain Reasoner

Zhou, Yitong; Cheng, Mingyue; Mao, Qingyang; Wang, Jiahao; Xu, Feiyang; Li, Xin

doi:10.24963/IJCAI.2025/279

Enhancing Table Recognition with Vision LLMs: A Benchmark and Neighbor-Guided Toolchain Reasoner

Yitong Zhou, Mingyue Cheng, Qingyang Mao, Jiahao Wang, Feiyang Xu, Xin Li

IJCAI 2025 pp. 2503-2511

doi:10.24963/IJCAI.2025/279 /ijcai/2025/zhou2025ijcai-enhancing/

Abstract

Pre-trained foundation models have recently made significant progress in table-related tasks such as table understanding and reasoning. However, recognizing the structure and content of unstructured tables using Vision Large Language Models (VLLMs) remains under-explored. To bridge this gap, we propose a benchmark based on a hierarchical design philosophy to evaluate the recognition capabilities of VLLMs in training-free scenarios. Through in-depth evaluations, we find that low-quality image input is a significant bottleneck in the recognition process. Drawing inspiration from this, we propose the Neighbor-Guided Toolchain Reasoner (NGTR) framework, which is characterized by integrating diverse lightweight tools for visual operations aimed at mitigating issues with low-quality images. Specifically, we transfer a tool selection experience from a similar neighbor to the input and design a reflection module to supervise the tool invocation process. Extensive experiments on public datasets demonstrate that our approach significantly enhances the recognition capabilities of the vanilla VLLMs. We believe that the benchmark and framework could provide an alternative solution to table recognition.

PDF IJCAI Semantic Scholar

Cite

Text

Zhou et al. "Enhancing Table Recognition with Vision LLMs: A Benchmark and Neighbor-Guided Toolchain Reasoner." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/279

Markdown

[Zhou et al. "Enhancing Table Recognition with Vision LLMs: A Benchmark and Neighbor-Guided Toolchain Reasoner." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/zhou2025ijcai-enhancing/) doi:10.24963/IJCAI.2025/279

BibTeX

@inproceedings{zhou2025ijcai-enhancing,
  title     = {{Enhancing Table Recognition with Vision LLMs: A Benchmark and Neighbor-Guided Toolchain Reasoner}},
  author    = {Zhou, Yitong and Cheng, Mingyue and Mao, Qingyang and Wang, Jiahao and Xu, Feiyang and Li, Xin},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {2503-2511},
  doi       = {10.24963/IJCAI.2025/279},
  url       = {https://mlanthology.org/ijcai/2025/zhou2025ijcai-enhancing/}
}