Simpler Diffusion: 1.5 FID on ImageNet512 with Pixel-Space Diffusion

Abstract

Latent diffusion models have become the popular choice for scaling up diffusion models for high resolution image synthesis. Compared to pixel-space models that are trained end-to-end, latent models are perceived to be more efficient and to produce higher image quality at high resolution. Here we challenge these notions, and show that pixel-space models can be very competitive to latent models both in quality and efficiency, achieving 1.5 FID on ImageNet512 and new SOTA results on ImageNet128, ImageNet256 and Kinetics600. We present a simple recipe for scaling end-to-end pixel-space diffusion models to high resolutions. 1: Use the sigmoid loss-weighting (Kingma & Gao, 2023) with our prescribed hyper-parameters. 2: Use our simplified memory-efficient architecture with fewer skip-connections. 3: Scale the model to favor processing the image at a high resolution with fewer parameters, rather than using more parameters at a lower resolution. Combining these with guidance intervals, we obtain a family of pixel-space diffusion models we call Simpler Diffusion (SiD2).

Cite

Text

Hoogeboom et al. "Simpler Diffusion: 1.5 FID on ImageNet512 with Pixel-Space Diffusion." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.01683

Markdown

[Hoogeboom et al. "Simpler Diffusion: 1.5 FID on ImageNet512 with Pixel-Space Diffusion." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/hoogeboom2025cvpr-simpler/) doi:10.1109/CVPR52734.2025.01683

BibTeX

@inproceedings{hoogeboom2025cvpr-simpler,
  title     = {{Simpler Diffusion: 1.5 FID on ImageNet512 with Pixel-Space Diffusion}},
  author    = {Hoogeboom, Emiel and Mensink, Thomas and Heek, Jonathan and Lamerigts, Kay and Gao, Ruiqi and Salimans, Tim},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {18062-18071},
  doi       = {10.1109/CVPR52734.2025.01683},
  url       = {https://mlanthology.org/cvpr/2025/hoogeboom2025cvpr-simpler/}
}