Learning Temporally Consistent Video Depth from Video Diffusion Priors
Abstract
This work addresses the challenge of streamed video depth estimation, which expects not only per-frame accuracy but, more importantly, cross-frame consistency. We argue that sharing contextual information between frames or clips is pivotal in fostering temporal consistency. Therefore, we reformulate depth prediction into a conditional generation problem to provide contextual information within a clip and across clips. Specifically, we propose a consistent context-aware training and inference strategy for arbitrarily long videos to provide cross-clip context. We sample independent noise levels for each frame within a clip during training while using a sliding window strategy and initializing overlapping frames with previously predicted frames without adding noise. Moreover, we design an effective training strategy to provide context within a clip. Extensive experimental results validate our design choices and demonstrate the superiority of our approach, dubbed ChronoDepth. Project page: https://xdimlab.github.io/ChronoDepth/.
Cite
Text
Shao et al. "Learning Temporally Consistent Video Depth from Video Diffusion Priors." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.02127Markdown
[Shao et al. "Learning Temporally Consistent Video Depth from Video Diffusion Priors." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/shao2025cvpr-learning/) doi:10.1109/CVPR52734.2025.02127BibTeX
@inproceedings{shao2025cvpr-learning,
title = {{Learning Temporally Consistent Video Depth from Video Diffusion Priors}},
author = {Shao, Jiahao and Yang, Yuanbo and Zhou, Hongyu and Zhang, Youmin and Shen, Yujun and Guizilini, Vitor and Wang, Yue and Poggi, Matteo and Liao, Yiyi},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2025},
pages = {22841-22852},
doi = {10.1109/CVPR52734.2025.02127},
url = {https://mlanthology.org/cvpr/2025/shao2025cvpr-learning/}
}