On Distillation of Guided Diffusion Models

Meng, Chenlin; Rombach, Robin; Gao, Ruiqi; Kingma, Diederik; Ermon, Stefano; Ho, Jonathan; Salimans, Tim

doi:10.1109/CVPR52729.2023.01374

On Distillation of Guided Diffusion Models

Chenlin Meng, Robin Rombach, Ruiqi Gao, Diederik Kingma, Stefano Ermon, Jonathan Ho, Tim Salimans

CVPR 2023 pp. 14297-14306

doi:10.1109/CVPR52729.2023.01374 /cvpr/2023/meng2023cvpr-distillation/

Abstract

Classifier-free guided diffusion models have recently been shown to be highly effective at high-resolution image generation, and they have been widely used in large-scale diffusion frameworks including DALL*E 2, Stable Diffusion and Imagen. However, a downside of classifier-free guided diffusion models is that they are computationally expensive at inference time since they require evaluating two diffusion models, a class-conditional model and an unconditional model, tens to hundreds of times. To deal with this limitation, we propose an approach to distilling classifier-free guided diffusion models into models that are fast to sample from: Given a pre-trained classifier-free guided model, we first learn a single model to match the output of the combined conditional and unconditional models, and then we progressively distill that model to a diffusion model that requires much fewer sampling steps. For standard diffusion models trained on the pixel-space, our approach is able to generate images visually comparable to that of the original model using as few as 4 sampling steps on ImageNet 64x64 and CIFAR-10, achieving FID/IS scores comparable to that of the original model while being up to 256 times faster to sample from. For diffusion models trained on the latent-space (e.g., Stable Diffusion), our approach is able to generate high-fidelity images using as few as 1 to 4 denoising steps, accelerating inference by at least 10-fold compared to existing methods on ImageNet 256x256 and LAION datasets. We further demonstrate the effectiveness of our approach on text-guided image editing and inpainting, where our distilled model is able to generate high-quality results using as few as 2-4 denoising steps.

PDF CVPR Semantic Scholar

Cite

Text

Meng et al. "On Distillation of Guided Diffusion Models." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.01374

Markdown

[Meng et al. "On Distillation of Guided Diffusion Models." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/meng2023cvpr-distillation/) doi:10.1109/CVPR52729.2023.01374

BibTeX

@inproceedings{meng2023cvpr-distillation,
  title     = {{On Distillation of Guided Diffusion Models}},
  author    = {Meng, Chenlin and Rombach, Robin and Gao, Ruiqi and Kingma, Diederik and Ermon, Stefano and Ho, Jonathan and Salimans, Tim},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {14297-14306},
  doi       = {10.1109/CVPR52729.2023.01374},
  url       = {https://mlanthology.org/cvpr/2023/meng2023cvpr-distillation/}
}