Bridging the Semantic Gap Between Text and Table: A Case Study on NL2SQL

Abstract

The rise of Large Language Models (LLMs) has revolutionized numerous domains, yet these models still exhibit weakness in understanding structured tabular data. Although the growing context window promises to accommodate a larger volume of table contents, it does not inherently improve the model's ability to understand the underlying structure and semantics of tabular data. To bridge the semantic gap between **T**ext and **T**able, we propose **T**n**T**, a table-language model that features multimodal table representations to empower LLMs to effectively and efficiently abstract structure-enriched semantics from tabular data. **T**n**T** also introduces a scalable and efficient training pipeline, featuring novel self-supervised tasks, to integrate abstract tabular knowledge into the language modality. Extensive experimental results on NL2SQL demonstrate a much better table understanding of **T**n**T**, which achieves up to **14.4** higher execution accuracy compared with traditional text-based table representations.

Cite

Text

Long et al. "Bridging the Semantic Gap Between Text and Table: A Case Study on NL2SQL." International Conference on Learning Representations, 2025.

Markdown

[Long et al. "Bridging the Semantic Gap Between Text and Table: A Case Study on NL2SQL." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/long2025iclr-bridging/)

BibTeX

@inproceedings{long2025iclr-bridging,
  title     = {{Bridging the Semantic Gap Between Text and Table: A Case Study on NL2SQL}},
  author    = {Long, Lin and Gu, Xijun and Sun, Xinjie and Ye, Wentao and Wang, Haobo and Wu, Sai and Chen, Gang and Zhao, Junbo},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/long2025iclr-bridging/}
}