TABGEN-RAG: Iterative Retrieval for Tabular Data Generation with Large Language Models

Abstract

Large Language models (LLMs) have achieved encouraging results on tabular data generation. However, existing approaches require fine-tuning, which is computationally expensive. This paper explores an alternative: prompting a fixed LLM with in-context examples. Two main challenges arise: 1) presenting the entire training table to LLMs with limited input token length, and 2) ensuring LLMs learn effectively from the in-context examples. To address these challenges, we propose a novel retrieval-augmented generation (RAG) framework: TabGen-RAG, to enhance the in-context learning ability of LLMs for tabular data generation. TabGEN-RAG operates iteratively, retrieving a subset of real samples that represent the residual between currently generated samples and true data. Extensive experiments on five real-world tabular datasets demonstrate that TabGEN-RAG significantly improves the quality of generated samples.

Cite

Text

Fang et al. "TABGEN-RAG: Iterative Retrieval for Tabular Data Generation with Large Language Models." NeurIPS 2024 Workshops: TRL, 2024.

Markdown

[Fang et al. "TABGEN-RAG: Iterative Retrieval for Tabular Data Generation with Large Language Models." NeurIPS 2024 Workshops: TRL, 2024.](https://mlanthology.org/neuripsw/2024/fang2024neuripsw-tabgenrag/)

BibTeX

@inproceedings{fang2024neuripsw-tabgenrag,
  title     = {{TABGEN-RAG: Iterative Retrieval for Tabular Data Generation with Large Language Models}},
  author    = {Fang, Liancheng and Liu, Aiwei and Zhang, Hengrui and Zou, Henry Peng and Zhang, Weizhi and Yu, Philip S.},
  booktitle = {NeurIPS 2024 Workshops: TRL},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/fang2024neuripsw-tabgenrag/}
}