Counterfactual Vision-and-Language Navigation: Unravelling the Unseen
Abstract
The task of vision-and-language navigation (VLN) requires an agent to follow text instructions to find its way through simulated household environments. A prominent challenge is to train an agent capable of generalising to new environments at test time, rather than one that simply memorises trajectories and visual details observed during training. We propose a new learning strategy that learns both from observations and generated counterfactual environments. We describe an effective algorithm to generate counterfactual observations on the fly for VLN, as linear combinations of existing environments. Simultaneously, we encourage the agent's actions to remain stable between original and counterfactual environments through our novel training objective-effectively removing the spurious features that otherwise bias the agent. Our experiments show that this technique provides significant improvements in generalisation on benchmarks for Room-to-Room navigation and Embodied Question Answering.
Cite
Text
Parvaneh et al. "Counterfactual Vision-and-Language Navigation: Unravelling the Unseen." Neural Information Processing Systems, 2020.Markdown
[Parvaneh et al. "Counterfactual Vision-and-Language Navigation: Unravelling the Unseen." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/parvaneh2020neurips-counterfactual/)BibTeX
@inproceedings{parvaneh2020neurips-counterfactual,
title = {{Counterfactual Vision-and-Language Navigation: Unravelling the Unseen}},
author = {Parvaneh, Amin and Abbasnejad, Ehsan and Teney, Damien and Shi, Javen Qinfeng and van den Hengel, Anton},
booktitle = {Neural Information Processing Systems},
year = {2020},
url = {https://mlanthology.org/neurips/2020/parvaneh2020neurips-counterfactual/}
}