SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite

Abstract

Although RGB-D sensors have enabled major breakthroughs for several vision tasks, such as 3D reconstruction, we have not attained the same level of success in high-level scene understanding. Perhaps one of the main reasons is the lack of a large-scale benchmark with 3D annotations and 3D evaluation metrics. In this paper, we introduce an RGB-D benchmark suite for the goal of advancing the state-of-the-arts in all major scene understanding tasks. Our dataset is captured by four different sensors and contains 10,335 RGB-D images, at a similar scale as PASCAL VOC. The whole dataset is densely annotated and includes 146,617 2D polygons and 64,595 3D bounding boxes with accurate object orientations, as well as a 3D room layout and scene category for each image. This dataset enables us to train data-hungry algorithms for scene-understanding tasks, evaluate them using meaningful 3D metrics, avoid overfitting to a small testing set, and study cross-sensor bias.

Cite

Text

Song et al. "SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite." Conference on Computer Vision and Pattern Recognition, 2015. doi:10.1109/CVPR.2015.7298655

Markdown

[Song et al. "SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite." Conference on Computer Vision and Pattern Recognition, 2015.](https://mlanthology.org/cvpr/2015/song2015cvpr-sun/) doi:10.1109/CVPR.2015.7298655

BibTeX

@inproceedings{song2015cvpr-sun,
  title     = {{SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite}},
  author    = {Song, Shuran and Lichtenberg, Samuel P. and Xiao, Jianxiong},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2015},
  doi       = {10.1109/CVPR.2015.7298655},
  url       = {https://mlanthology.org/cvpr/2015/song2015cvpr-sun/}
}