Your Diffusion Model Is Secretly a Zero-Shot Classifier

Abstract

The recent wave of large-scale text-to-image diffusion models has dramatically increased our text-based image generation abilities. However, almost all use cases so far have solely focused on sampling. In this paper, we show that the density estimates from large-scale text-to-image diffusion models like Stable Diffusion can be leveraged to perform zero-shot classification without any additional training. Our generative approach to classification, which we call Diffusion Classifier, attains strong results on a variety of benchmarks and outperforms alternative methods of extracting knowledge from diffusion models. We also find that our diffusion-based approach has stronger multimodal relational reasoning abilities than competing discriminative approaches. Finally, we use Diffusion Classifier to extract standard classifiers from class-conditional diffusion models trained on ImageNet. Even though these models are trained with weak augmentations and no regularization, they approach the performance of SOTA discriminative classifiers. Overall, our results are a step toward using generative over discriminative models for downstream tasks

Cite

Text

Li et al. "Your Diffusion Model Is Secretly a Zero-Shot Classifier." ICML 2023 Workshops: SPIGM, 2023.

Markdown

[Li et al. "Your Diffusion Model Is Secretly a Zero-Shot Classifier." ICML 2023 Workshops: SPIGM, 2023.](https://mlanthology.org/icmlw/2023/li2023icmlw-your/)

BibTeX

@inproceedings{li2023icmlw-your,
  title     = {{Your Diffusion Model Is Secretly a Zero-Shot Classifier}},
  author    = {Li, Alexander Cong and Prabhudesai, Mihir and Duggal, Shivam and Brown, Ellis Langham and Pathak, Deepak},
  booktitle = {ICML 2023 Workshops: SPIGM},
  year      = {2023},
  url       = {https://mlanthology.org/icmlw/2023/li2023icmlw-your/}
}