Dimensionality-Varying Diffusion Process

Abstract

Diffusion models, which learn to reverse a signal destruction process to generate new data, typically require the signal at each step to have the same dimension. We argue that, considering the spatial redundancy in image signals, there is no need to maintain a high dimensionality in the evolution process, especially in the early generation phase. To this end, we make a theoretical generalization of the forward diffusion process via signal decomposition. Concretely, we manage to decompose an image into multiple orthogonal components and control the attenuation of each component when perturbing the image. That way, along with the noise strength increasing, we are able to diminish those inconsequential components and thus use a lower-dimensional signal to represent the source, barely losing information. Such a reformulation allows to vary dimensions in both training and inference of diffusion models. Extensive experiments on a range of datasets suggest that our approach substantially reduces the computational cost and achieves on-par or even better synthesis performance compared to baseline methods. We also show that our strategy facilitates high-resolution image synthesis and improves FID of diffusion model trained on FFHQ at 1024x1024 resolution from 52.40 to 10.46. Code is available at https://github.com/damo-vilab/dvdp.

Cite

Text

Zhang et al. "Dimensionality-Varying Diffusion Process." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.01375

Markdown

[Zhang et al. "Dimensionality-Varying Diffusion Process." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/zhang2023cvpr-dimensionalityvarying/) doi:10.1109/CVPR52729.2023.01375

BibTeX

@inproceedings{zhang2023cvpr-dimensionalityvarying,
  title     = {{Dimensionality-Varying Diffusion Process}},
  author    = {Zhang, Han and Feng, Ruili and Yang, Zhantao and Huang, Lianghua and Liu, Yu and Zhang, Yifei and Shen, Yujun and Zhao, Deli and Zhou, Jingren and Cheng, Fan},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {14307-14316},
  doi       = {10.1109/CVPR52729.2023.01375},
  url       = {https://mlanthology.org/cvpr/2023/zhang2023cvpr-dimensionalityvarying/}
}