Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring Expression
Abstract
The Remote Embodied Referring Expression (REVERIE) is a recently raised task that requires an agent to navigate to and localise a referred remote object according to a high-level language instruction. Different from related VLN tasks, the key to REVERIE is to conduct goal-oriented exploration instead of strict instruction-following, due to the lack of step-by-step navigation guidance. In this paper, we propose a novel Cross-modality Knowledge Reasoning (CKR) model to address the unique challenges of this task. The CKR, based on a transformer-architecture, learns to generate scene memory tokens and utilise these informative history clues for exploration. Particularly, a Room-and-Object Aware Attention (ROAA) mechanism is devised to explicitly perceive the room- and object-type information from both linguistic and visual observations. Moreover, through incorporating commonsense knowledge, we propose a Knowledge-enabled Entity Relationship Reasoning (KERR) module to learn the internal-external correlations among room- and object-entities for agent to make proper action at each viewpoint. Evaluation on REVERIE benchmark demonstrates the superiority of the CKR model, which significantly boosts SPL and REVERIE-success rate by 64.67% and 46.05%, respectively. Code is available at: https://github.com/alloldman/CKR.
Cite
Text
Gao et al. "Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring Expression." Conference on Computer Vision and Pattern Recognition, 2021. doi:10.1109/CVPR46437.2021.00308Markdown
[Gao et al. "Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring Expression." Conference on Computer Vision and Pattern Recognition, 2021.](https://mlanthology.org/cvpr/2021/gao2021cvpr-roomandobject/) doi:10.1109/CVPR46437.2021.00308BibTeX
@inproceedings{gao2021cvpr-roomandobject,
title = {{Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring Expression}},
author = {Gao, Chen and Chen, Jinyu and Liu, Si and Wang, Luting and Zhang, Qiong and Wu, Qi},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2021},
pages = {3064-3073},
doi = {10.1109/CVPR46437.2021.00308},
url = {https://mlanthology.org/cvpr/2021/gao2021cvpr-roomandobject/}
}