Depth Pro: Sharp Monocular Metric Depth in Less than a Second
Abstract
We present a foundation model for zero-shot metric monocular depth estimation. Our model, Depth Pro, synthesizes high-resolution depth maps with unparalleled sharpness and high-frequency details. The predictions are metric, with absolute scale, without relying on the availability of metadata such as camera intrinsics. And the model is fast, producing a 2.25-megapixel depth map in 0.3 seconds on a standard GPU. These characteristics are enabled by a number of technical contributions, including an efficient multi-scale vision transformer for dense prediction, a training protocol that combines real and synthetic datasets to achieve high metric accuracy alongside fine boundary tracing, dedicated evaluation metrics for boundary accuracy in estimated depth maps, and state-of-the-art focal length estimation from a single image. Extensive experiments analyze specific design choices and demonstrate that Depth Pro outperforms prior work along multiple dimensions. We release code & weights at https://github.com/apple/ml-depth-pro
Cite
Text
Bochkovskiy et al. "Depth Pro: Sharp Monocular Metric Depth in Less than a Second." International Conference on Learning Representations, 2025.Markdown
[Bochkovskiy et al. "Depth Pro: Sharp Monocular Metric Depth in Less than a Second." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/bochkovskiy2025iclr-depth/)BibTeX
@inproceedings{bochkovskiy2025iclr-depth,
title = {{Depth Pro: Sharp Monocular Metric Depth in Less than a Second}},
author = {Bochkovskiy, Alexey and Delaunoy, Amaël and Germain, Hugo and Santos, Marcel and Zhou, Yichao and Richter, Stephan and Koltun, Vladlen},
booktitle = {International Conference on Learning Representations},
year = {2025},
url = {https://mlanthology.org/iclr/2025/bochkovskiy2025iclr-depth/}
}