Scaling Properties of Diffusion Models for Perceptual Tasks
Abstract
In this paper, we argue that iterative computation with diffusion models offers a powerful paradigm for not only generation but also visual perception tasks. We unify tasks such as depth estimation, optical flow, and amodal segmentation under the framework of image-to-image translation, and show how diffusion models benefit from scaling training and test-time compute for these perceptual tasks. Through a careful analysis of these scaling properties, we formulate compute-optimal training and inference recipes to scale diffusion models for visual perception tasks. Our models achieve competitive performance to state-of-the-art methods using significantly less data and compute.
Cite
Text
Ravishankar et al. "Scaling Properties of Diffusion Models for Perceptual Tasks." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.01208Markdown
[Ravishankar et al. "Scaling Properties of Diffusion Models for Perceptual Tasks." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/ravishankar2025cvpr-scaling/) doi:10.1109/CVPR52734.2025.01208BibTeX
@inproceedings{ravishankar2025cvpr-scaling,
title = {{Scaling Properties of Diffusion Models for Perceptual Tasks}},
author = {Ravishankar, Rahul and Patel, Zeeshan and Rajasegaran, Jathushan and Malik, Jitendra},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2025},
pages = {12945-12954},
doi = {10.1109/CVPR52734.2025.01208},
url = {https://mlanthology.org/cvpr/2025/ravishankar2025cvpr-scaling/}
}