Structure-Guided Ranking Loss for Single Image Depth Prediction
Abstract
Single image depth prediction is a challenging task due to its ill-posed nature and challenges with capturing ground truth for supervision. Large-scale disparity data generated from stereo photos and 3D videos is a promising source of supervision, however, such disparity data can only approximate the inverse ground truth depth up to an affine transformation. To more effectively learn from such pseudo-depth data, we propose to use a simple pair-wise ranking loss with a novel sampling strategy. Instead of randomly sampling point pairs, we guide the sampling to better characterize structure of important regions based on the low-level edge maps and high-level object instance masks. We show that the pair-wise ranking loss, combined with our structure-guided sampling strategies, can significantly improve the quality of depth map prediction. In addition, we introduce a new relative depth dataset of about 21K diverse high-resolution web stereo photos to enhance the generalization ability of our model. In experiments, we conduct cross-dataset evaluation on six benchmark datasets and show that our method consistently improves over the baselines, leading to superior quantitative and qualitative results.
Cite
Text
Xian et al. "Structure-Guided Ranking Loss for Single Image Depth Prediction." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. doi:10.1109/CVPR42600.2020.00069Markdown
[Xian et al. "Structure-Guided Ranking Loss for Single Image Depth Prediction." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.](https://mlanthology.org/cvpr/2020/xian2020cvpr-structureguided/) doi:10.1109/CVPR42600.2020.00069BibTeX
@inproceedings{xian2020cvpr-structureguided,
title = {{Structure-Guided Ranking Loss for Single Image Depth Prediction}},
author = {Xian, Ke and Zhang, Jianming and Wang, Oliver and Mai, Long and Lin, Zhe and Cao, Zhiguo},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year = {2020},
doi = {10.1109/CVPR42600.2020.00069},
url = {https://mlanthology.org/cvpr/2020/xian2020cvpr-structureguided/}
}