RACOON: An LLM-Based Framework for Retrieval-Augmented Column Type Annotation with a Knowledge Graph

Abstract

As an important component of data exploration and integration, Column Type Annotation (CTA) aims to label columns of a table with one or more semantic types. With the recent development of Large Language Models (LLMs), researchers have started to explore the possibility of using LLMs for CTA, leveraging their strong zero-shot capabilities. In this paper, we build on this promising work and improve on LLM-based methods for CTA by showing how to use a Knowledge Graph (KG) to augment the context information provided to the LLM. Our approach, called RACOON, combines both pre-trained parametric and non-parametric knowledge during generation to improve LLMs’ performance on CTA. Our experiments show that RACOON achieves up to a 0.21 micro F-1 improvement compared against vanilla LLM inference.

Cite

Text

Wei et al. "RACOON: An LLM-Based Framework for Retrieval-Augmented Column Type Annotation with a Knowledge Graph." NeurIPS 2024 Workshops: TRL, 2024.

Markdown

[Wei et al. "RACOON: An LLM-Based Framework for Retrieval-Augmented Column Type Annotation with a Knowledge Graph." NeurIPS 2024 Workshops: TRL, 2024.](https://mlanthology.org/neuripsw/2024/wei2024neuripsw-racoon/)

BibTeX

@inproceedings{wei2024neuripsw-racoon,
  title     = {{RACOON: An LLM-Based Framework for Retrieval-Augmented Column Type Annotation with a Knowledge Graph}},
  author    = {Wei, Lindsey Linxi and Xiao, Guorui and Balazinska, Magdalena},
  booktitle = {NeurIPS 2024 Workshops: TRL},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/wei2024neuripsw-racoon/}
}