FlashDepth: Real-Time Streaming Video Depth Estimation at 2k Resolution

Abstract

A versatile video depth estimation model should be consistent and accurate across frames, produce high-resolution depth maps, and support real-time streaming. We propose a method, FlashDepth, that satisfies all three requirements, performing depth estimation for a 2044x1148 streaming video at 24 FPS. We show that, with careful modifications to pretrained single-image depth models, these capabilities are enabled with relatively little data and training. We validate our approach across multiple unseen datasets against state-of-the-art depth models, and find that our method outperforms them in terms of boundary sharpness and speed by a significant margin, while maintaining competitive accuracy. We hope our model will enable various applications that require high-resolution depth, such as visual effects editing, and online decision-making, such as robotics. We release all code and model weights at https://github.com/Eyeline-Research/FlashDepth.

Cite

Text

Chou et al. "FlashDepth: Real-Time Streaming Video Depth Estimation at 2k Resolution." International Conference on Computer Vision, 2025.

Markdown

[Chou et al. "FlashDepth: Real-Time Streaming Video Depth Estimation at 2k Resolution." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/chou2025iccv-flashdepth/)

BibTeX

@inproceedings{chou2025iccv-flashdepth,
  title     = {{FlashDepth: Real-Time Streaming Video Depth Estimation at 2k Resolution}},
  author    = {Chou, Gene and Xian, Wenqi and Yang, Guandao and Abdelfattah, Mohamed and Hariharan, Bharath and Snavely, Noah and Yu, Ning and Debevec, Paul},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {9638-9648},
  url       = {https://mlanthology.org/iccv/2025/chou2025iccv-flashdepth/}
}