ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes
Abstract
A key requirement for leveraging supervised deep learning methods is the availability of large, labeled datasets. Unfortunately, in the context of RGB-D scene understanding, very little data is available -- current datasets cover a small range of scene views and have limited semantic annotations. To address this issue, we introduce ScanNet, an RGB-D video dataset containing 2.5M views in 1513 scenes annotated with 3D camera poses, surface reconstructions, and semantic segmentations. To collect this data, we designed an easy-to-use and scalable RGB-D capture system that includes automated surface reconstruction and crowdsourced semantic annotation. We show that using this data helps achieve state-of-the-art performance on several 3D scene understanding tasks, including 3D object classification, semantic voxel labeling, and CAD model retrieval.
Cite
Text
Dai et al. "ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes." Conference on Computer Vision and Pattern Recognition, 2017. doi:10.1109/CVPR.2017.261Markdown
[Dai et al. "ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes." Conference on Computer Vision and Pattern Recognition, 2017.](https://mlanthology.org/cvpr/2017/dai2017cvpr-scannet/) doi:10.1109/CVPR.2017.261BibTeX
@inproceedings{dai2017cvpr-scannet,
title = {{ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes}},
author = {Dai, Angela and Chang, Angel X. and Savva, Manolis and Halber, Maciej and Funkhouser, Thomas and Niessner, Matthias},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2017},
doi = {10.1109/CVPR.2017.261},
url = {https://mlanthology.org/cvpr/2017/dai2017cvpr-scannet/}
}