TABGEN-RAG: Iterative Retrieval for Tabular Data Generation with Large Language Models
Abstract
Large Language models (LLMs) have achieved encouraging results on tabular data generation. However, existing approaches require fine-tuning, which is computationally expensive. This paper explores an alternative: prompting a fixed LLM with in-context examples. Two main challenges arise: 1) presenting the entire training table to LLMs with limited input token length, and 2) ensuring LLMs learn effectively from the in-context examples. To address these challenges, we propose a novel retrieval-augmented generation (RAG) framework: TabGen-RAG, to enhance the in-context learning ability of LLMs for tabular data generation. TabGEN-RAG operates iteratively, retrieving a subset of real samples that represent the residual between currently generated samples and true data. Extensive experiments on five real-world tabular datasets demonstrate that TabGEN-RAG significantly improves the quality of generated samples.
Cite
Text
Fang et al. "TABGEN-RAG: Iterative Retrieval for Tabular Data Generation with Large Language Models." NeurIPS 2024 Workshops: TRL, 2024.Markdown
[Fang et al. "TABGEN-RAG: Iterative Retrieval for Tabular Data Generation with Large Language Models." NeurIPS 2024 Workshops: TRL, 2024.](https://mlanthology.org/neuripsw/2024/fang2024neuripsw-tabgenrag/)BibTeX
@inproceedings{fang2024neuripsw-tabgenrag,
title = {{TABGEN-RAG: Iterative Retrieval for Tabular Data Generation with Large Language Models}},
author = {Fang, Liancheng and Liu, Aiwei and Zhang, Hengrui and Zou, Henry Peng and Zhang, Weizhi and Yu, Philip S.},
booktitle = {NeurIPS 2024 Workshops: TRL},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/fang2024neuripsw-tabgenrag/}
}