GenDataAgent: On-the-Fly Dataset Augmentation with Synthetic Data
Abstract
We propose a generative agent that augments training datasets with synthetic data for model fine-tuning. Unlike prior work, which uniformly samples synthetic data, our agent iteratively generates relevant samples on-the-fly, aligning with the target distribution. It prioritizes synthetic data that complements difficult training samples, focusing on those with high variance in gradient updates. Experiments across several image classification tasks demonstrate the effectiveness of our approach.
Cite
Text
Li et al. "GenDataAgent: On-the-Fly Dataset Augmentation with Synthetic Data." International Conference on Learning Representations, 2025.Markdown
[Li et al. "GenDataAgent: On-the-Fly Dataset Augmentation with Synthetic Data." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/li2025iclr-gendataagent/)BibTeX
@inproceedings{li2025iclr-gendataagent,
title = {{GenDataAgent: On-the-Fly Dataset Augmentation with Synthetic Data}},
author = {Li, Zhiteng and Chen, Lele and Andrews, Jerone and Ba, Yunhao and Zhang, Yulun and Xiang, Alice},
booktitle = {International Conference on Learning Representations},
year = {2025},
url = {https://mlanthology.org/iclr/2025/li2025iclr-gendataagent/}
}