Learning Single Camera Depth Estimation Using Dual-Pixels

Abstract

Deep learning techniques have enabled rapid progress in monocular depth estimation, but their quality is limited by the ill-posed nature of the problem and the scarcity of high quality datasets. We estimate depth from a single cam-era by leveraging the dual-pixel auto-focus hardware that is increasingly common on modern camera sensors. Classic stereo algorithms and prior learning-based depth estimation techniques underperform when applied on this dual-pixel data, the former due to too-strong assumptions about RGB image matching, and the latter due to not leveraging the understanding of optics of dual-pixel image formation. To allow learning based methods to work well on dual-pixel imagery, we identify an inherent ambiguity in the depth estimated from dual-pixel cues, and develop an approach to estimate depth up to this ambiguity. Using our approach, existing monocular depth estimation techniques can be effectively applied to dual-pixel data, and much smaller models can be constructed that still infer high quality depth. To demonstrate this, we capture a large dataset of in-the-wild 5-viewpoint RGB images paired with corresponding dual-pixel data, and show how view supervision with this data can be used to learn depth up to the unknown ambiguities. On our new task, our model is 30% more accurate than any prior work on learning-based monocular or stereoscopic depth estimation.

Cite

Text

Garg et al. "Learning Single Camera Depth Estimation Using Dual-Pixels." Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019. doi:10.1109/ICCV.2019.00772

Markdown

[Garg et al. "Learning Single Camera Depth Estimation Using Dual-Pixels." Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.](https://mlanthology.org/iccv/2019/garg2019iccv-learning/) doi:10.1109/ICCV.2019.00772

BibTeX

@inproceedings{garg2019iccv-learning,
  title     = {{Learning Single Camera Depth Estimation Using Dual-Pixels}},
  author    = {Garg, Rahul and Wadhwa, Neal and Ansari, Sameer and Barron, Jonathan T.},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year      = {2019},
  doi       = {10.1109/ICCV.2019.00772},
  url       = {https://mlanthology.org/iccv/2019/garg2019iccv-learning/}
}