D$^3$epth: Distilling Diffusion Models for Efficient Depth Estimation Through a Two-Stage Approach

Abstract

Diffusion-based monocular depth estimation models demonstrate strong performance with limited supervision by leveraging pre-trained text-to-image models. However, their multi-step inference process and large model size create prohibitive computational overhead for practical applications. To retain the data efficiency of diffusion models while addressing their inference inefficiency, we propose a framework that enhances diffusion-based depth estimation through a two-stage training approach. The first stage distills implicit depth knowledge in the latent space by leveraging the rich representations from pre-trained diffusion models. The second stage refines explicit depth predictions in pixel space using Hybrid Depth Loss that combines Shift-Scale Invariant (SSI) loss for global structure preservation with Edge-aware Gradient Huber loss for fine-grained detail enhancement. Both components are adaptively weighted using a dynamic task weighting strategy, balancing structural consistency and boundary precision. Specifically, we demonstrate that our two-stage distillation approach yields D$^3$epth, an efficient variant that achieves state-of-the-art results while significantly reducing computational requirements. In parallel, our base model D$^2$epth, trained with enhanced pixel-space depth loss, also surpasses state-of-the-art performance across various benchmarks. Overall, these results deliver the accuracy benefits of diffusion-based methods at the efficiency level of traditional data-driven approaches.

Cite

Text

Chuang et al. "D$^3$epth: Distilling Diffusion Models for Efficient Depth Estimation Through a Two-Stage Approach." Proceedings of the 17th Asian Conference on Machine Learning, 2025.

Markdown

[Chuang et al. "D$^3$epth: Distilling Diffusion Models for Efficient Depth Estimation Through a Two-Stage Approach." Proceedings of the 17th Asian Conference on Machine Learning, 2025.](https://mlanthology.org/acml/2025/chuang2025acml-3epth/)

BibTeX

@inproceedings{chuang2025acml-3epth,
  title     = {{D$^3$epth: Distilling Diffusion Models for Efficient Depth Estimation Through a Two-Stage Approach}},
  author    = {Chuang, Bo-Chih and Lin, Wei-Tung and Chen, Shang-Fu and Hua, Kailung},
  booktitle = {Proceedings of the 17th Asian Conference on Machine Learning},
  year      = {2025},
  pages     = {81-96},
  volume    = {304},
  url       = {https://mlanthology.org/acml/2025/chuang2025acml-3epth/}
}