Learning to Recover 3D Scene Shape from a Single Image
Abstract
Despite significant progress in monocular depth estimation in the wild, recent state-of-the-art methods cannot be used to recover accurate 3D scene shape due to an unknown depth shift induced by shift-invariant reconstruction losses used in mixed-data depth prediction training, and possible unknown camera focal length. We investigate this problem in detail and propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image, and then use 3D point cloud encoders to predict the missing depth shift and focal length that allow us to recover a realistic 3D scene shape. In addition, we propose an image-level normalized regression loss and a normal-based geometry loss to enhance depth prediction models trained on mixed datasets. We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot dataset generalization. Code is available at:https://git.io/Depth.
Cite
Text
Yin et al. "Learning to Recover 3D Scene Shape from a Single Image." Conference on Computer Vision and Pattern Recognition, 2021. doi:10.1109/CVPR46437.2021.00027Markdown
[Yin et al. "Learning to Recover 3D Scene Shape from a Single Image." Conference on Computer Vision and Pattern Recognition, 2021.](https://mlanthology.org/cvpr/2021/yin2021cvpr-learning-a/) doi:10.1109/CVPR46437.2021.00027BibTeX
@inproceedings{yin2021cvpr-learning-a,
title = {{Learning to Recover 3D Scene Shape from a Single Image}},
author = {Yin, Wei and Zhang, Jianming and Wang, Oliver and Niklaus, Simon and Mai, Long and Chen, Simon and Shen, Chunhua},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2021},
pages = {204-213},
doi = {10.1109/CVPR46437.2021.00027},
url = {https://mlanthology.org/cvpr/2021/yin2021cvpr-learning-a/}
}