A Priority mAP for Vision-and-Language Navigation with Trajectory Plans and Feature-Location Cues

Abstract

In a busy city street, a pedestrian surrounded by distractions can pick out a single sign if it is relevant to their route. Artificial agents in outdoor Vision-and-Language Navigation (VLN) are also confronted with detecting supervisory signal on environment features and location in inputs. To boost the prominence of relevant features in transformer-based systems without costly preprocessing and pretraining, we take inspiration from priority maps - a mechanism described in neuropsychological studies. We implement a novel priority map module and pretrain on auxiliary tasks using low-sample datasets with high-level representations of routes and environment-related references to urban features. A hierarchical process of trajectory planning - with subsequent parameterised visual boost filtering on visual inputs and prediction of corresponding textual spans - addresses the core challenge of cross-modal alignment and feature-level localisation. The priority map module is integrated into a feature-location framework that doubles the task completion rates of standalone transformers and attains state-of-the-art performance for transformer-based systems on the Touchdown benchmark for VLN. We release code (https://github.com/JasonArmitage-res/PM-VLN) and data (https://zenodo.org/record/6891965#.YtwoS3ZBxD8).

Cite

Text

Armitage et al. "A Priority mAP for Vision-and-Language Navigation with Trajectory Plans and Feature-Location Cues." Winter Conference on Applications of Computer Vision, 2023.

Markdown

[Armitage et al. "A Priority mAP for Vision-and-Language Navigation with Trajectory Plans and Feature-Location Cues." Winter Conference on Applications of Computer Vision, 2023.](https://mlanthology.org/wacv/2023/armitage2023wacv-priority/)

BibTeX

@inproceedings{armitage2023wacv-priority,
  title     = {{A Priority mAP for Vision-and-Language Navigation with Trajectory Plans and Feature-Location Cues}},
  author    = {Armitage, Jason and Impett, Leonardo and Sennrich, Rico},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2023},
  pages     = {1094-1103},
  url       = {https://mlanthology.org/wacv/2023/armitage2023wacv-priority/}
}