Incremental 3D Semantic Scene Graph Prediction from RGB Sequences

Abstract

3D semantic scene graphs are a powerful holistic representation as they describe the individual objects and depict the relation between them. They are compact high-level graphs that enable many tasks requiring scene reasoning. In real-world settings, existing 3D estimation methods produce robust predictions that mostly rely on dense inputs. In this work, we propose a real-time framework that incrementally builds a consistent 3D semantic scene graph of a scene given an RGB image sequence. Our method consists of a novel incremental entity estimation pipeline and a scene graph prediction network. The proposed pipeline simultaneously reconstructs a sparse point map and fuses entity estimation from the input images. The proposed network estimates 3D semantic scene graphs with iterative message passing using multi-view and geometric features extracted from the scene entities. Extensive experiments on the 3RScan dataset show the effectiveness of the proposed method in this challenging task, outperforming state-of-the-art approaches.

Cite

Text

Wu et al. "Incremental 3D Semantic Scene Graph Prediction from RGB Sequences." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.00490

Markdown

[Wu et al. "Incremental 3D Semantic Scene Graph Prediction from RGB Sequences." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/wu2023cvpr-incremental/) doi:10.1109/CVPR52729.2023.00490

BibTeX

@inproceedings{wu2023cvpr-incremental,
  title     = {{Incremental 3D Semantic Scene Graph Prediction from RGB Sequences}},
  author    = {Wu, Shun-Cheng and Tateno, Keisuke and Navab, Nassir and Tombari, Federico},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {5064-5074},
  doi       = {10.1109/CVPR52729.2023.00490},
  url       = {https://mlanthology.org/cvpr/2023/wu2023cvpr-incremental/}
}