Free-Form Description Guided 3D Visual Graph Network for Object Grounding in Point Cloud

Abstract

3D object grounding aims to locate the most relevant target object in a raw point cloud scene based on a free-form language description. Understanding complex and diverse descriptions, and lifting them directly to a point cloud is a new and challenging topic due to the irregular and sparse nature of point clouds. There are three main challenges in 3D object grounding: to find the main focus in the complex and diverse description; to understand the point cloud scene; and to locate the target object. In this paper, we address all three challenges. Firstly, we propose a language scene graph module to capture the rich structure and long-distance phrase correlations. Secondly, we introduce a multi-level 3D proposal relation graph module to extract the object-object and object-scene co-occurrence relationships, and strengthen the visual features of the initial proposals. Lastly, we develop a description guided 3D visual graph module to encode global contexts of phrases and proposals by a nodes matching strategy. Extensive experiments on challenging benchmark datasets (ScanRefer and Nr3D) show that our algorithm outperforms existing state-of-the-art. Our code is available at https://github.com/PNXD/FFL-3DOG.

Cite

Text

Feng et al. "Free-Form Description Guided 3D Visual Graph Network for Object Grounding in Point Cloud." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.00370

Markdown

[Feng et al. "Free-Form Description Guided 3D Visual Graph Network for Object Grounding in Point Cloud." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/feng2021iccv-freeform/) doi:10.1109/ICCV48922.2021.00370

BibTeX

@inproceedings{feng2021iccv-freeform,
  title     = {{Free-Form Description Guided 3D Visual Graph Network for Object Grounding in Point Cloud}},
  author    = {Feng, Mingtao and Li, Zhen and Li, Qi and Zhang, Liang and Zhang, XiangDong and Zhu, Guangming and Zhang, Hui and Wang, Yaonan and Mian, Ajmal},
  booktitle = {International Conference on Computer Vision},
  year      = {2021},
  pages     = {3722-3731},
  doi       = {10.1109/ICCV48922.2021.00370},
  url       = {https://mlanthology.org/iccv/2021/feng2021iccv-freeform/}
}