The Journey, Not the Destination: How Data Guides Diffusion Models

Abstract

Diffusion-based generative models can synthesize photo-realistic images of remarkable quality and diversity. However, *attributing* these images back to the training data---that is, identifying specific training examples which caused an image to be generated---remains a challenge. In this paper, we propose a framework that: (i) frames data attribution in the context of diffusion models, (ii) provides a method for computing such attributions efficiently, and (iii) allows us to *counterfactually* validate them. We then apply our framework to CIFAR-10 and MS COCO datasets.

Cite

Text

Georgiev et al. "The Journey, Not the Destination: How Data Guides Diffusion Models." ICML 2023 Workshops: DeployableGenerativeAI, 2023.

Markdown

[Georgiev et al. "The Journey, Not the Destination: How Data Guides Diffusion Models." ICML 2023 Workshops: DeployableGenerativeAI, 2023.](https://mlanthology.org/icmlw/2023/georgiev2023icmlw-journey/)

BibTeX

@inproceedings{georgiev2023icmlw-journey,
  title     = {{The Journey, Not the Destination: How Data Guides Diffusion Models}},
  author    = {Georgiev, Kristian and Vendrow, Joshua and Salman, Hadi and Park, Sung Min and Madry, Aleksander},
  booktitle = {ICML 2023 Workshops: DeployableGenerativeAI},
  year      = {2023},
  url       = {https://mlanthology.org/icmlw/2023/georgiev2023icmlw-journey/}
}