Fixed Point Diffusion Models

Abstract

We introduce the Fixed Point Diffusion Model (FPDM) a novel approach to image generation that integrates the concept of fixed point solving into the framework of diffusion-based generative modeling. Our approach embeds an implicit fixed point solving layer into the denoising network of a diffusion model transforming the diffusion process into a sequence of closely-related fixed point problems. Combined with a new stochastic training method this approach significantly reduces model size reduces memory usage and accelerates training. Moreover it enables the development of two new techniques to improve sampling efficiency: reallocating computation across timesteps and reusing fixed point solutions between timesteps. We conduct extensive experiments with state-of-the-art models on ImageNet FFHQ CelebA-HQ and LSUN-Church demonstrating substantial improvements in performance and efficiency. Compared to the state-of-the-art DiT model FPDM contains 87% fewer parameters consumes 60% less memory during training and improves image generation quality in situations where sampling computation or time is limited.

Cite

Text

Bai and Melas-Kyriazi. "Fixed Point Diffusion Models." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00901

Markdown

[Bai and Melas-Kyriazi. "Fixed Point Diffusion Models." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/bai2024cvpr-fixed/) doi:10.1109/CVPR52733.2024.00901

BibTeX

@inproceedings{bai2024cvpr-fixed,
  title     = {{Fixed Point Diffusion Models}},
  author    = {Bai, Xingjian and Melas-Kyriazi, Luke},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {9430-9440},
  doi       = {10.1109/CVPR52733.2024.00901},
  url       = {https://mlanthology.org/cvpr/2024/bai2024cvpr-fixed/}
}