Temporally Consistent Depth Estimation in Videos with Recurrent Architectures

Abstract

Convolutional networks trained on large RGB-D datasets have enabled depth estimation from a single image. Many works on automotive applications rely on such approaches. However, all existing methods work on a frame-by-frame manner when applied to videos, which leads to inconsistent depth estimates over time. In this paper, we introduce for the first time an approach that yields temporally consistent depth estimates over multiple frames of a video. This is done by a dedicated architecture based on convolutional LSTM units and layer normalization. Our approach achieves superior performance on several error metrics when compared to independent frame processing. This also shows in an improved quality of the reconstructed multi-view point clouds.

Cite

Text

Tananaev et al. "Temporally Consistent Depth Estimation in Videos with Recurrent Architectures." European Conference on Computer Vision Workshops, 2018. doi:10.1007/978-3-030-11015-4_52

Markdown

[Tananaev et al. "Temporally Consistent Depth Estimation in Videos with Recurrent Architectures." European Conference on Computer Vision Workshops, 2018.](https://mlanthology.org/eccvw/2018/tananaev2018eccvw-temporally/) doi:10.1007/978-3-030-11015-4_52

BibTeX

@inproceedings{tananaev2018eccvw-temporally,
  title     = {{Temporally Consistent Depth Estimation in Videos with Recurrent Architectures}},
  author    = {Tananaev, Denis and Zhou, Huizhong and Ummenhofer, Benjamin and Brox, Thomas},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2018},
  pages     = {689-701},
  doi       = {10.1007/978-3-030-11015-4_52},
  url       = {https://mlanthology.org/eccvw/2018/tananaev2018eccvw-temporally/}
}