Plug-and-Play Diffusion Distillation
Abstract
Diffusion models have shown tremendous results in image generation. However due to the iterative nature of the diffusion process and its reliance on classifier-free guidance inference times are slow. In this paper we propose a new distillation approach for guided diffusion models in which an external lightweight guide model is trained while the original text-to-image model remains frozen. We show that our method reduces the inference computation of classifier-free guided latent-space diffusion models by almost half and only requires 1% trainable parameters of the base model. Furthermore once trained our guide model can be applied to various fine-tuned domain-specific versions of the base diffusion model without the need for additional training: this "plug-and-play" functionality drastically improves inference computation while maintaining the visual fidelity of generated images. Empirically we show that our approach is able to produce visually appealing results and achieve a comparable FID score to the teacher with as few as 8 to 16 steps.
Cite
Text
Hsiao et al. "Plug-and-Play Diffusion Distillation." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.01304Markdown
[Hsiao et al. "Plug-and-Play Diffusion Distillation." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/hsiao2024cvpr-plugandplay/) doi:10.1109/CVPR52733.2024.01304BibTeX
@inproceedings{hsiao2024cvpr-plugandplay,
title = {{Plug-and-Play Diffusion Distillation}},
author = {Hsiao, Yi-Ting and Khodadadeh, Siavash and Duarte, Kevin and Lin, Wei-An and Qu, Hui and Kwon, Mingi and Kalarot, Ratheesh},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2024},
pages = {13743-13752},
doi = {10.1109/CVPR52733.2024.01304},
url = {https://mlanthology.org/cvpr/2024/hsiao2024cvpr-plugandplay/}
}