Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring Expression

Gao, Chen; Chen, Jinyu; Liu, Si; Wang, Luting; Zhang, Qiong; Wu, Qi

doi:10.1109/CVPR46437.2021.00308

Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring Expression

Chen Gao, Jinyu Chen, Si Liu, Luting Wang, Qiong Zhang, Qi Wu

CVPR 2021 pp. 3064-3073

doi:10.1109/CVPR46437.2021.00308 /cvpr/2021/gao2021cvpr-roomandobject/

Abstract

The Remote Embodied Referring Expression (REVERIE) is a recently raised task that requires an agent to navigate to and localise a referred remote object according to a high-level language instruction. Different from related VLN tasks, the key to REVERIE is to conduct goal-oriented exploration instead of strict instruction-following, due to the lack of step-by-step navigation guidance. In this paper, we propose a novel Cross-modality Knowledge Reasoning (CKR) model to address the unique challenges of this task. The CKR, based on a transformer-architecture, learns to generate scene memory tokens and utilise these informative history clues for exploration. Particularly, a Room-and-Object Aware Attention (ROAA) mechanism is devised to explicitly perceive the room- and object-type information from both linguistic and visual observations. Moreover, through incorporating commonsense knowledge, we propose a Knowledge-enabled Entity Relationship Reasoning (KERR) module to learn the internal-external correlations among room- and object-entities for agent to make proper action at each viewpoint. Evaluation on REVERIE benchmark demonstrates the superiority of the CKR model, which significantly boosts SPL and REVERIE-success rate by 64.67% and 46.05%, respectively. Code is available at: https://github.com/alloldman/CKR.

PDF CVPR Semantic Scholar

Cite

Text

Gao et al. "Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring Expression." Conference on Computer Vision and Pattern Recognition, 2021. doi:10.1109/CVPR46437.2021.00308

Markdown

[Gao et al. "Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring Expression." Conference on Computer Vision and Pattern Recognition, 2021.](https://mlanthology.org/cvpr/2021/gao2021cvpr-roomandobject/) doi:10.1109/CVPR46437.2021.00308

BibTeX

@inproceedings{gao2021cvpr-roomandobject,
  title     = {{Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring Expression}},
  author    = {Gao, Chen and Chen, Jinyu and Liu, Si and Wang, Luting and Zhang, Qiong and Wu, Qi},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2021},
  pages     = {3064-3073},
  doi       = {10.1109/CVPR46437.2021.00308},
  url       = {https://mlanthology.org/cvpr/2021/gao2021cvpr-roomandobject/}
}