Context-Guided Responsible Data Augmentation with Diffusion Models

Abstract

Generative diffusion models offer a natural choice for data augmentation when training complex vision models. However, ensuring reliability of their generative content as augmentation samples remains an open challenge. Despite a number of techniques utilizing generative images to strengthen model training, it remains unclear how to utilize the combination of natural and generative images as a rich supervisory signal for effective model induction. In this regard, we propose a text-to-image (T2I) data augmentation method, named DiffCoRe-Mix, that computes a set of generative counterparts for a training sample with an explicitly constrained diffusion model that leverages sample-based context and negative prompting for a reliable augmentation sample generation. To preserve key semantic axes, we also filter out undesired generative samples in our augmentation process. To that end, we propose a hard-cosine filtration in the embedding space of CLIP. Our approach systematically mixes the natural and generative images at pixel and patch levels. We extensively evaluate our technique on ImageNet-1K, Tiny ImageNet-200, CIFAR-100, Flowers102, CUB-Birds, Stanford Cars, and Caltech datasets, demonstrating a notable increase in performance across the board, achieving up to $\sim 3\%$ absolute gain for top-1 accuracy over the state-of-the-art methods, while showing comparable computational overhead.

Cite

Text

Islam and Akhtar. "Context-Guided Responsible Data Augmentation with Diffusion Models." ICLR 2025 Workshops: Data_Problems, 2025.

Markdown

[Islam and Akhtar. "Context-Guided Responsible Data Augmentation with Diffusion Models." ICLR 2025 Workshops: Data_Problems, 2025.](https://mlanthology.org/iclrw/2025/islam2025iclrw-contextguided/)

BibTeX

@inproceedings{islam2025iclrw-contextguided,
  title     = {{Context-Guided Responsible Data Augmentation with Diffusion Models}},
  author    = {Islam, Khawar and Akhtar, Naveed},
  booktitle = {ICLR 2025 Workshops: Data_Problems},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/islam2025iclrw-contextguided/}
}