Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models

Abstract

Recently diffusion models have made remarkable progress in text-to-image (T2I) generation synthesizing images with high fidelity and diverse contents. Despite this advancement latent space smoothness within diffusion models remains largely unexplored. Smooth latent spaces ensure that a perturbation on an input latent corresponds to a steady change in the output image. This property proves beneficial in downstream tasks including image interpolation inversion and editing. In this work we expose the non-smoothness of diffusion latent spaces by observing noticeable visual fluctuations resulting from minor latent variations. To tackle this issue we propose Smooth Diffusion a new category of diffusion models that can be simultaneously high-performing and smooth. Specifically we introduce Step-wise Variation Regularization to enforce the proportion between the variations of an arbitrary input latent and that of the output image is a constant at any diffusion training step. In addition we devise an interpolation standard deviation (ISTD) metric to effectively assess the latent space smoothness of a diffusion model. Extensive quantitative and qualitative experiments demonstrate that Smooth Diffusion stands out as a more desirable solution not only in T2I generation but also across various downstream tasks. Smooth Diffusion is implemented as a plug-and-play Smooth-LoRA to work with various community models. Code is available at https://github.com/SHI-Labs/Smooth-Diffusion.

Cite

Text

Guo et al. "Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00721

Markdown

[Guo et al. "Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/guo2024cvpr-smooth/) doi:10.1109/CVPR52733.2024.00721

BibTeX

@inproceedings{guo2024cvpr-smooth,
  title     = {{Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models}},
  author    = {Guo, Jiayi and Xu, Xingqian and Pu, Yifan and Ni, Zanlin and Wang, Chaofei and Vasu, Manushree and Song, Shiji and Huang, Gao and Shi, Humphrey},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {7548-7558},
  doi       = {10.1109/CVPR52733.2024.00721},
  url       = {https://mlanthology.org/cvpr/2024/guo2024cvpr-smooth/}
}