Counterfactual VQA: A Cause-Effect Look at Language Bias

Abstract

Recent VQA models may tend to rely on language bias as a shortcut and thus fail to sufficiently learn the multi-modal knowledge from both vision and language. In this paper, we investigate how to capture and mitigate language bias in VQA. Motivated by causal effects, we proposed a novel counterfactual inference framework, which enables us to capture the language bias as the direct causal effect of questions on answers and reduce the language bias by subtracting the direct language effect from the total causal effect. Experiments demonstrate that our proposed counterfactual inference framework 1) is general to various VQA backbones and fusion strategies, 2) achieves competitive performance on the language-bias sensitive VQA-CP dataset while performs robustly on the balanced VQA v2 dataset without any argumented data. The code is available at https://github.com/yuleiniu/cfvqa.

Cite

Text

Niu et al. "Counterfactual VQA: A Cause-Effect Look at Language Bias." Conference on Computer Vision and Pattern Recognition, 2021. doi:10.1109/CVPR46437.2021.01251

Markdown

[Niu et al. "Counterfactual VQA: A Cause-Effect Look at Language Bias." Conference on Computer Vision and Pattern Recognition, 2021.](https://mlanthology.org/cvpr/2021/niu2021cvpr-counterfactual/) doi:10.1109/CVPR46437.2021.01251

BibTeX

@inproceedings{niu2021cvpr-counterfactual,
  title     = {{Counterfactual VQA: A Cause-Effect Look at Language Bias}},
  author    = {Niu, Yulei and Tang, Kaihua and Zhang, Hanwang and Lu, Zhiwu and Hua, Xian-Sheng and Wen, Ji-Rong},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2021},
  pages     = {12700-12710},
  doi       = {10.1109/CVPR46437.2021.01251},
  url       = {https://mlanthology.org/cvpr/2021/niu2021cvpr-counterfactual/}
}