Your Diffusion Model Is Secretly a Zero-Shot Classifier
Abstract
The recent wave of large-scale text-to-image diffusion models has dramatically increased our text-based image generation abilities. These models can generate realistic images for a staggering variety of prompts and exhibit impressive compositional generalization abilities. Almost all use cases thus far have solely focused on sampling; however, diffusion models can also provide conditional density estimates, which are useful for tasks beyond image generation. In this paper, we show that the density estimates from large-scale text-to-image diffusion models like Stable Diffusion can be leveraged to perform zero-shot classification without any additional training. Our generative approach to classification, which we call Diffusion Classifier, attains strong results on a variety of benchmarks and outperforms alternative methods of extracting knowledge from diffusion models. Although a gap remains between generative and discriminative approaches on zero-shot recognition tasks, our diffusion-based approach has stronger multimodal compositional reasoning abilities than competing discriminative approaches. Finally, we use Diffusion Classifier to extract standard classifiers from class-conditional diffusion models trained on ImageNet. These models approach the performance of SOTA discriminative classifiers and exhibit strong "effective robustness" to distribution shift. Overall, our results are a step toward using generative over discriminative models for downstream tasks.
Cite
Text
Li et al. "Your Diffusion Model Is Secretly a Zero-Shot Classifier." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.00210Markdown
[Li et al. "Your Diffusion Model Is Secretly a Zero-Shot Classifier." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/li2023iccv-your/) doi:10.1109/ICCV51070.2023.00210BibTeX
@inproceedings{li2023iccv-your,
title = {{Your Diffusion Model Is Secretly a Zero-Shot Classifier}},
author = {Li, Alexander C. and Prabhudesai, Mihir and Duggal, Shivam and Brown, Ellis and Pathak, Deepak},
booktitle = {International Conference on Computer Vision},
year = {2023},
pages = {2206-2217},
doi = {10.1109/ICCV51070.2023.00210},
url = {https://mlanthology.org/iccv/2023/li2023iccv-your/}
}