Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs
Abstract
We present an Open-Vocabulary 3D Scene Graph (OVSG), a formal framework for grounding a variety of entities, such as object instances, agents, and regions, with free-form text-based queries. Unlike conventional semantic-based object localization approaches, our system facilitates context-aware entity localization, allowing for queries such as “pick up a cup on a kitchen table" or “navigate to a sofa on which someone is sitting". In contrast to existing research on 3D scene graphs, OVSG supports free-form text input and open-vocabulary querying. Through a series of comparative experiments using the ScanNet dataset and a self-collected dataset, we demonstrate that our proposed approach significantly surpasses the performance of previous semantic-based localization techniques. Moreover, we highlight the practical application of OVSG in real-world robot navigation and manipulation experiments. The code and dataset used for evaluation will be made available upon publication.
Cite
Text
Chang et al. "Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs." Conference on Robot Learning, 2023.Markdown
[Chang et al. "Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs." Conference on Robot Learning, 2023.](https://mlanthology.org/corl/2023/chang2023corl-contextaware/)BibTeX
@inproceedings{chang2023corl-contextaware,
title = {{Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs}},
author = {Chang, Haonan and Boyalakuntla, Kowndinya and Lu, Shiyang and Cai, Siwei and Jing, Eric Pu and Keskar, Shreesh and Geng, Shijie and Abbas, Adeeb and Zhou, Lifeng and Bekris, Kostas and Boularias, Abdeslam},
booktitle = {Conference on Robot Learning},
year = {2023},
pages = {1950-1974},
volume = {229},
url = {https://mlanthology.org/corl/2023/chang2023corl-contextaware/}
}