Learning 3D Scene Semantics and Structure from a Single Depth Image

Bo Yang, Zihang Lai, Xiaoxuan Lu, Shuyu Lin, Hongkai Wen, Andrew Markham, Niki Trigoni

CVPRW 2018 pp. 309-312

doi:10.1109/CVPRW.2018.00069 /cvprw/2018/yang2018cvprw-learning/

Abstract

In this paper, we aim to understand the semantics and 3D structure of a scene from a single depth image. Recent deep neural networks based methods aim to simultaneously learn object class labels and infer the 3D shape of a scene represented by a large voxel grid. However, individual objects within the scene are usually only represented by a few voxels leading to a loss of geometric detail. In addition, significant computational and memory resources are required to process the large scale voxel grid of a whole scene. To address this, we propose an efficient and holistic pipeline, 3R-Depth, to simultaneously learn the semantics and structure of a scene from a single depth image. Our key idea is to deeply fuse an efficient 3D shape estimator with existing recognition (e.g., ResNets) and segmentation (e.g., MaskR-CNN) techniques. Object level semantics and latent feature maps are extracted and then fed to a shape estimator to extract the 3D shape. Extensive experiments are conducted on large-scale synthesized indoor scene datasets, quantitatively and qualitatively demonstrating the merits and superior performance of 3R-Depth.

PDF CVPRW Semantic Scholar

Cite

Text

Yang et al. "Learning 3D Scene Semantics and Structure from a Single Depth Image." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2018. doi:10.1109/CVPRW.2018.00069

Markdown

[Yang et al. "Learning 3D Scene Semantics and Structure from a Single Depth Image." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2018.](https://mlanthology.org/cvprw/2018/yang2018cvprw-learning/) doi:10.1109/CVPRW.2018.00069

BibTeX

@inproceedings{yang2018cvprw-learning,
  title     = {{Learning 3D Scene Semantics and Structure from a Single Depth Image}},
  author    = {Yang, Bo and Lai, Zihang and Lu, Xiaoxuan and Lin, Shuyu and Wen, Hongkai and Markham, Andrew and Trigoni, Niki},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2018},
  pages     = {309-312},
  doi       = {10.1109/CVPRW.2018.00069},
  url       = {https://mlanthology.org/cvprw/2018/yang2018cvprw-learning/}
}