Revealing the Reciprocal Relations Between Self-Supervised Stereo and Monocular Depth Estimation

Abstract

Current self-supervised depth estimation algorithms mainly focus on either stereo or monocular only, neglecting the reciprocal relations between them. In this paper, we propose a simple yet effective framework to improve both stereo and monocular depth estimation by leveraging the underlying complementary knowledge of the two tasks. Our approach consists of three stages. In the first stage, the proposed stereo matching network termed StereoNet is trained on image pairs in a self-supervised manner. Second, we introduce an occlusion-aware distillation (OA Distillation) module, which leverages the predicted depths from StereoNet in non-occluded regions to train our monocular depth estimation network named SingleNet. At last, we design an occlusion-aware fusion module (OA Fusion), which generates more reliable depths by fusing estimated depths from StereoNet and SingleNet given the occlusion map. Furthermore, we also take the fused depths as pseudo labels to supervise StereoNet in turn, which brings StereoNet's performance to a new height. Extensive experiments on KITTI dataset demonstrate the effectiveness of our proposed framework. We achieve new SOTA performance on both stereo and monocular depth estimation tasks.

Cite

Text

Chen et al. "Revealing the Reciprocal Relations Between Self-Supervised Stereo and Monocular Depth Estimation." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.01524

Markdown

[Chen et al. "Revealing the Reciprocal Relations Between Self-Supervised Stereo and Monocular Depth Estimation." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/chen2021iccv-revealing/) doi:10.1109/ICCV48922.2021.01524

BibTeX

@inproceedings{chen2021iccv-revealing,
  title     = {{Revealing the Reciprocal Relations Between Self-Supervised Stereo and Monocular Depth Estimation}},
  author    = {Chen, Zhi and Ye, Xiaoqing and Yang, Wei and Xu, Zhenbo and Tan, Xiao and Zou, Zhikang and Ding, Errui and Zhang, Xinming and Huang, Liusheng},
  booktitle = {International Conference on Computer Vision},
  year      = {2021},
  pages     = {15529-15538},
  doi       = {10.1109/ICCV48922.2021.01524},
  url       = {https://mlanthology.org/iccv/2021/chen2021iccv-revealing/}
}