DeepV2D: Video to Depth with Differentiable Structure from Motion

Abstract

We propose DeepV2D, an end-to-end deep learning architecture for predicting depth from video. DeepV2D combines the representation ability of neural networks with the geometric principles governing image formation. We compose a collection of classical geometric algorithms, which are converted into trainable modules and combined into an end-to-end differentiable architecture. DeepV2D interleaves two stages: motion estimation and depth estimation. During inference, motion and depth estimation are alternated and converge to accurate depth.

Cite

Text

Teed and Deng. "DeepV2D: Video to Depth with Differentiable Structure from Motion." International Conference on Learning Representations, 2020.

Markdown

[Teed and Deng. "DeepV2D: Video to Depth with Differentiable Structure from Motion." International Conference on Learning Representations, 2020.](https://mlanthology.org/iclr/2020/teed2020iclr-deepv2d/)

BibTeX

@inproceedings{teed2020iclr-deepv2d,
  title     = {{DeepV2D: Video to Depth with Differentiable Structure from Motion}},
  author    = {Teed, Zachary and Deng, Jia},
  booktitle = {International Conference on Learning Representations},
  year      = {2020},
  url       = {https://mlanthology.org/iclr/2020/teed2020iclr-deepv2d/}
}