StereoNet: Guided Hierarchical Refinement for Real-Time Edge-Aware Depth Prediction

Abstract

This paper presents StereoNet, the first end-to-end deep architecture for real-time stereo matching that runs at 60 fps on an NVidia Titan X, producing high-quality, edge-preserved, quantization-free depth maps. A key insight of this paper is that the network achieves a sub-pixel matching precision than is a magnitude higher than those of traditional stereo matching approaches. This allows us to achieve real-time performance by using a very low resolution cost volume that encodes all the information needed to achieve high depth precision. Spatial precision is achieved by employing a learned edge-aware upsampling function. Our model uses a Siamese network to extract features from the left and right image. A first estimate of the disparity is computed in a very low resolution cost volume, then hierarchically the model re-introduces high-frequency details through a learned upsampling function that uses compact pixel-to-pixel refinement networks. Leveraging color input as a guide, this function is capable of producing high-quality edge-aware output. We achieve compelling results on multiple benchmarks, showing how the proposed method offers extreme flexibility at an acceptable computational budget.

Cite

Text

Khamis et al. "StereoNet: Guided Hierarchical Refinement for Real-Time Edge-Aware Depth Prediction." Proceedings of the European Conference on Computer Vision (ECCV), 2018. doi:10.1007/978-3-030-01267-0_35

Markdown

[Khamis et al. "StereoNet: Guided Hierarchical Refinement for Real-Time Edge-Aware Depth Prediction." Proceedings of the European Conference on Computer Vision (ECCV), 2018.](https://mlanthology.org/eccv/2018/khamis2018eccv-stereonet/) doi:10.1007/978-3-030-01267-0_35

BibTeX

@inproceedings{khamis2018eccv-stereonet,
  title     = {{StereoNet: Guided Hierarchical Refinement for Real-Time Edge-Aware Depth Prediction}},
  author    = {Khamis, Sameh and Fanello, Sean and Rhemann, Christoph and Kowdle, Adarsh and Valentin, Julien and Izadi, Shahram},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2018},
  doi       = {10.1007/978-3-030-01267-0_35},
  url       = {https://mlanthology.org/eccv/2018/khamis2018eccv-stereonet/}
}