Distilling Diffusion Models into Conditional GANs

Abstract

We propose a method to distill a complex multistep diffusion model into a single-step conditional GAN student model, dramatically accelerating inference, while preserving image quality. Our approach interprets diffusion distillation as a paired image-to-image translation task, using noise-to-image pairs of the diffusion model’s ODE trajectory. For efficient regression loss computation, we propose E-LatentLPIPS, a perceptual loss operating directly in diffusion model’s latent space, utilizing an ensemble of augmentations. Furthermore, we adapt a diffusion model to construct a multi-scale discriminator with a text alignment loss to build an effective conditional GAN-based formulation. E-LatentLPIPS converges more efficiently than many existing distillation methods, even accounting for dataset construction costs. We demonstrate that our one-step generator outperforms cutting-edge one-step diffusion distillation models – SDXL-Turbo and SDXL-Lightning – on the COCO benchmark.

Cite

Text

Kang et al. "Distilling Diffusion Models into Conditional GANs." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73390-1_25

Markdown

[Kang et al. "Distilling Diffusion Models into Conditional GANs." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/kang2024eccv-distilling/) doi:10.1007/978-3-031-73390-1_25

BibTeX

@inproceedings{kang2024eccv-distilling,
  title     = {{Distilling Diffusion Models into Conditional GANs}},
  author    = {Kang, MinGuk and Zhang, Richard and Barnes, Connelly and Paris, Sylvain and Kwak, Suha and Park, Jaesik and Shechtman, Eli and Zhu, Jun-Yan and Park, Taesung},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-73390-1_25},
  url       = {https://mlanthology.org/eccv/2024/kang2024eccv-distilling/}
}