View-Volume Network for Semantic Scene Completion from a Single Depth Image
Abstract
We introduce a View-Volume convolutional neural network (VVNet) for inferring the occupancy and semantic labels of a volumetric 3D scene from a single depth image. Our method extracts the detailed geometric features from the input depth image with a 2D view CNN and then projects the features into a 3D volume according to the input depth map via a projection layer. After that, we learn the 3D context information of the scene with a 3D volume CNN for computing the result volumetric occupancy and semantic labels. With combined 2D and 3D representations, the VVNet efficiently reduces the computational cost, enables feature extraction from multi-channel high resolution inputs, and thus significantly improve the result accuracy. We validate our method and demonstrate its efficiency and effectiveness on both synthetic SUNCG and real NYU dataset.
Cite
Text
Guo and Tong. "View-Volume Network for Semantic Scene Completion from a Single Depth Image." International Joint Conference on Artificial Intelligence, 2018. doi:10.24963/IJCAI.2018/101Markdown
[Guo and Tong. "View-Volume Network for Semantic Scene Completion from a Single Depth Image." International Joint Conference on Artificial Intelligence, 2018.](https://mlanthology.org/ijcai/2018/guo2018ijcai-view/) doi:10.24963/IJCAI.2018/101BibTeX
@inproceedings{guo2018ijcai-view,
title = {{View-Volume Network for Semantic Scene Completion from a Single Depth Image}},
author = {Guo, Yuxiao and Tong, Xin},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2018},
pages = {726-732},
doi = {10.24963/IJCAI.2018/101},
url = {https://mlanthology.org/ijcai/2018/guo2018ijcai-view/}
}