ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes

Abstract

We present ScanNet++, a large-scale dataset that couples together capture of high-quality and commodity-level geometry and color of indoor scenes. Each scene is captured with a high-end laser scanner at sub-millimeter resolution, along with registered 33-megapixel images from a DSLR camera, and RGB-D streams from an iPhone. Scene reconstructions are further annotated with an open vocabulary of semantics, with label-ambiguous scenarios explicitly annotated for comprehensive semantic understanding. ScanNet++ enables a new real-world benchmark for novel view synthesis, both from high-quality RGB capture, and importantly also from commodity-level images, in addition to a new benchmark for 3D semantic scene understanding that comprehensively encapsulates diverse and ambiguous semantic labeling scenarios. Currently, ScanNet++ contains 460 scenes, 280,000 captured DSLR images, and over 3.7M iPhone RGBD frames.

Cite

Text

Yeshwanth et al. "ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.00008

Markdown

[Yeshwanth et al. "ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/yeshwanth2023iccv-scannet/) doi:10.1109/ICCV51070.2023.00008

BibTeX

@inproceedings{yeshwanth2023iccv-scannet,
  title     = {{ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes}},
  author    = {Yeshwanth, Chandan and Liu, Yueh-Cheng and Nießner, Matthias and Dai, Angela},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {12-22},
  doi       = {10.1109/ICCV51070.2023.00008},
  url       = {https://mlanthology.org/iccv/2023/yeshwanth2023iccv-scannet/}
}