S2R-DepthNet: Learning a Generalizable Depth-Specific Structural Representation
Abstract
Human can infer the 3D geometry of a scene from a sketch instead of a realistic image, which indicates that the spatial structure plays a fundamental role in understanding the depth of scenes. We are the first to explore the learning of a depth-specific structural representation, which captures the essential feature for depth estimation and ignores irrelevant style information. Our S2R-DepthNet (Synthetic to Real DepthNet) can be well generalized to unseen real-world data directly even though it is only trained on synthetic data. S2R-DepthNet consists of: a) a Structure Extraction (STE) module which extracts a domaininvariant structural representation from an image by disentangling the image into domain-invariant structure and domain-specific style components, b) a Depth-specific Attention (DSA) module, which learns task-specific knowledge to suppress depth-irrelevant structures for better depth estimation and generalization, and c) a depth prediction module (DP) to predict depth from the depth-specific representation. Without access of any real-world images, our method even outperforms the state-of-the-art unsupervised domain adaptation methods which use real-world images of the target domain for training. In addition, when using a small amount of labeled real-world data, we achieve the state-of-the-art performance under the semi-supervised setting.
Cite
Text
Chen et al. "S2R-DepthNet: Learning a Generalizable Depth-Specific Structural Representation." Conference on Computer Vision and Pattern Recognition, 2021. doi:10.1109/CVPR46437.2021.00305Markdown
[Chen et al. "S2R-DepthNet: Learning a Generalizable Depth-Specific Structural Representation." Conference on Computer Vision and Pattern Recognition, 2021.](https://mlanthology.org/cvpr/2021/chen2021cvpr-s2rdepthnet/) doi:10.1109/CVPR46437.2021.00305BibTeX
@inproceedings{chen2021cvpr-s2rdepthnet,
title = {{S2R-DepthNet: Learning a Generalizable Depth-Specific Structural Representation}},
author = {Chen, Xiaotian and Wang, Yuwang and Chen, Xuejin and Zeng, Wenjun},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2021},
pages = {3034-3043},
doi = {10.1109/CVPR46437.2021.00305},
url = {https://mlanthology.org/cvpr/2021/chen2021cvpr-s2rdepthnet/}
}