Scaling Properties of Diffusion Models for Perceptual Tasks

Abstract

In this paper, we argue that iterative computation with diffusion models offers a powerful paradigm for not only generation but also visual perception tasks. We unify tasks such as depth estimation, optical flow, and amodal segmentation under the framework of image-to-image translation, and show how diffusion models benefit from scaling training and test-time compute for these perceptual tasks. Through a careful analysis of these scaling properties, we formulate compute-optimal training and inference recipes to scale diffusion models for visual perception tasks. Our models achieve competitive performance to state-of-the-art methods using significantly less data and compute.

Cite

Text

Ravishankar et al. "Scaling Properties of Diffusion Models for Perceptual Tasks." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.01208

Markdown

[Ravishankar et al. "Scaling Properties of Diffusion Models for Perceptual Tasks." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/ravishankar2025cvpr-scaling/) doi:10.1109/CVPR52734.2025.01208

BibTeX

@inproceedings{ravishankar2025cvpr-scaling,
  title     = {{Scaling Properties of Diffusion Models for Perceptual Tasks}},
  author    = {Ravishankar, Rahul and Patel, Zeeshan and Rajasegaran, Jathushan and Malik, Jitendra},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {12945-12954},
  doi       = {10.1109/CVPR52734.2025.01208},
  url       = {https://mlanthology.org/cvpr/2025/ravishankar2025cvpr-scaling/}
}