Weakly-Supervised Multi-Granularity mAP Learning for Vision-and-Language Navigation

Abstract

We address a practical yet challenging problem of training robot agents to navigate in an environment following a path described by some language instructions. The instructions often contain descriptions of objects in the environment. To achieve accurate and efficient navigation, it is critical to build a map that accurately represents both spatial location and the semantic information of the environment objects. However, enabling a robot to build a map that well represents the environment is extremely challenging as the environment often involves diverse objects with various attributes. In this paper, we propose a multi-granularity map, which contains both object fine-grained details (\eg, color, texture) and semantic classes, to represent objects more comprehensively. Moreover, we propose a weakly-supervised auxiliary task, which requires the agent to localize instruction-relevant objects on the map. Through this task, the agent not only learns to localize the instruction-relevant objects for navigation but also is encouraged to learn a better map representation that reveals object information. We then feed the learned map and instruction to a waypoint predictor to determine the next navigation goal. Experimental results show our method outperforms the state-of-the-art by 4.0% and 4.6% w.r.t. success rate both in seen and unseen environments, respectively on VLN-CE dataset. The code is available at https://github.com/PeihaoChen/WS-MGMap.

Cite

Text

Chen et al. "Weakly-Supervised Multi-Granularity mAP Learning for Vision-and-Language Navigation." Neural Information Processing Systems, 2022.

Markdown

[Chen et al. "Weakly-Supervised Multi-Granularity mAP Learning for Vision-and-Language Navigation." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/chen2022neurips-weaklysupervised/)

BibTeX

@inproceedings{chen2022neurips-weaklysupervised,
  title     = {{Weakly-Supervised Multi-Granularity mAP Learning for Vision-and-Language Navigation}},
  author    = {Chen, Peihao and Ji, Dongyu and Lin, Kunyang and Zeng, Runhao and Li, Thomas and Tan, Mingkui and Gan, Chuang},
  booktitle = {Neural Information Processing Systems},
  year      = {2022},
  url       = {https://mlanthology.org/neurips/2022/chen2022neurips-weaklysupervised/}
}