End-to-End Learning of Geometry and Context for Deep Stereo Regression

Abstract

We propose a novel deep learning architecture for regressing disparity from a rectified pair of stereo images. We leverage knowledge of the problem's geometry to form a cost volume using deep feature representations. We learn to incorporate contextual information using 3-D convolutions over this volume. Disparity values are regressed from the cost volume using a proposed differentiable soft argmin operation, which allows us to train our method end-to-end to sub-pixel accuracy without any additional post-processing or regularization. We evaluate our method on the Scene Flow and KITTI datasets and on KITTI we set a new state-of-the-art benchmark, while being significantly faster than competing approaches.

Cite

Text

Kendall et al. "End-to-End Learning of Geometry and Context for Deep Stereo Regression." International Conference on Computer Vision, 2017. doi:10.1109/ICCV.2017.17

Markdown

[Kendall et al. "End-to-End Learning of Geometry and Context for Deep Stereo Regression." International Conference on Computer Vision, 2017.](https://mlanthology.org/iccv/2017/kendall2017iccv-endtoend/) doi:10.1109/ICCV.2017.17

BibTeX

@inproceedings{kendall2017iccv-endtoend,
  title     = {{End-to-End Learning of Geometry and Context for Deep Stereo Regression}},
  author    = {Kendall, Alex and Martirosyan, Hayk and Dasgupta, Saumitro and Henry, Peter and Kennedy, Ryan and Bachrach, Abraham and Bry, Adam},
  booktitle = {International Conference on Computer Vision},
  year      = {2017},
  doi       = {10.1109/ICCV.2017.17},
  url       = {https://mlanthology.org/iccv/2017/kendall2017iccv-endtoend/}
}