Plug-and-Play Diffusion Distillation

Abstract

Diffusion models have shown tremendous results in image generation. However due to the iterative nature of the diffusion process and its reliance on classifier-free guidance inference times are slow. In this paper we propose a new distillation approach for guided diffusion models in which an external lightweight guide model is trained while the original text-to-image model remains frozen. We show that our method reduces the inference computation of classifier-free guided latent-space diffusion models by almost half and only requires 1% trainable parameters of the base model. Furthermore once trained our guide model can be applied to various fine-tuned domain-specific versions of the base diffusion model without the need for additional training: this "plug-and-play" functionality drastically improves inference computation while maintaining the visual fidelity of generated images. Empirically we show that our approach is able to produce visually appealing results and achieve a comparable FID score to the teacher with as few as 8 to 16 steps.

Cite

Text

Hsiao et al. "Plug-and-Play Diffusion Distillation." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.01304

Markdown

[Hsiao et al. "Plug-and-Play Diffusion Distillation." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/hsiao2024cvpr-plugandplay/) doi:10.1109/CVPR52733.2024.01304

BibTeX

@inproceedings{hsiao2024cvpr-plugandplay,
  title     = {{Plug-and-Play Diffusion Distillation}},
  author    = {Hsiao, Yi-Ting and Khodadadeh, Siavash and Duarte, Kevin and Lin, Wei-An and Qu, Hui and Kwon, Mingi and Kalarot, Ratheesh},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {13743-13752},
  doi       = {10.1109/CVPR52733.2024.01304},
  url       = {https://mlanthology.org/cvpr/2024/hsiao2024cvpr-plugandplay/}
}