Towards Zero-Shot Scale-Aware Monocular Depth Estimation

Vitor Guizilini, Igor Vasiljevic, Dian Chen, Rareș Ambruș, Adrien Gaidon

ICCV 2023 pp. 9233-9243

doi:10.1109/ICCV51070.2023.00847 /iccv/2023/guizilini2023iccv-zeroshot/

Abstract

Monocular depth estimation is scale-ambiguous, and thus requires scale supervision to produce metric predictions. Even so, the resulting models will be geometry-specific, with learned scales that cannot be directly transferred across domains. Because of that, recent works focus instead on relative depth, eschewing scale in favor of improved up-to-scale zero-shot transfer. In this work we introduce ZeroDepth, a novel monocular depth estimation framework capable of predicting metric scale for arbitrary test images from different domains and camera parameters. This is achieved by (i) the use of input-level geometric embeddings that enable the network to learn a scale prior over objects; and (ii) decoupling the encoder and decoder stages, via a variational latent representation that is conditioned on single frame information. We evaluated ZeroDepth targeting both outdoor (KITTI, DDAD, nuScenes) and indoor (NYUv2) benchmarks, and achieved a new state-of-the-art in both settings using the same pre-trained model, outperforming methods that train on in-domain data and require test-time scaling to produce metric estimates.

PDF ICCV Semantic Scholar

Cite

Text

Guizilini et al. "Towards Zero-Shot Scale-Aware Monocular Depth Estimation." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.00847

Markdown

[Guizilini et al. "Towards Zero-Shot Scale-Aware Monocular Depth Estimation." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/guizilini2023iccv-zeroshot/) doi:10.1109/ICCV51070.2023.00847

BibTeX

@inproceedings{guizilini2023iccv-zeroshot,
  title     = {{Towards Zero-Shot Scale-Aware Monocular Depth Estimation}},
  author    = {Guizilini, Vitor and Vasiljevic, Igor and Chen, Dian and Ambruș, Rareș and Gaidon, Adrien},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {9233-9243},
  doi       = {10.1109/ICCV51070.2023.00847},
  url       = {https://mlanthology.org/iccv/2023/guizilini2023iccv-zeroshot/}
}