Vision-and-Language Navigation via Causal Learning
Abstract
In the pursuit of robust and generalizable environment perception and language understanding the ubiquitous challenge of dataset bias continues to plague vision-and-language navigation (VLN) agents hindering their performance in unseen environments. This paper introduces the generalized cross-modal causal transformer (GOAT) a pioneering solution rooted in the paradigm of causal inference. By delving into both observable and unobservable confounders within vision language and history we propose the back-door and front-door adjustment causal learning (BACL and FACL) modules to promote unbiased learning by comprehensively mitigating potential spurious correlations. Additionally to capture global confounder features we propose a cross-modal feature pooling (CFP) module supervised by contrastive learning which is also shown to be effective in improving cross-modal representations during pre-training. Extensive experiments across multiple VLN datasets (R2R REVERIE RxR and SOON) underscore the superiority of our proposed method over previous state-of-the-art approaches. Code is available at https://github.com/CrystalSixone/VLN-GOAT.
Cite
Text
Wang et al. "Vision-and-Language Navigation via Causal Learning." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.01248Markdown
[Wang et al. "Vision-and-Language Navigation via Causal Learning." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/wang2024cvpr-visionandlanguage/) doi:10.1109/CVPR52733.2024.01248BibTeX
@inproceedings{wang2024cvpr-visionandlanguage,
title = {{Vision-and-Language Navigation via Causal Learning}},
author = {Wang, Liuyi and He, Zongtao and Dang, Ronghao and Shen, Mengjiao and Liu, Chengju and Chen, Qijun},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2024},
pages = {13139-13150},
doi = {10.1109/CVPR52733.2024.01248},
url = {https://mlanthology.org/cvpr/2024/wang2024cvpr-visionandlanguage/}
}