Overcoming Language Priors with Self-Supervised Learning for Visual Question Answering

Zhu, Xi; Mao, Zhendong; Liu, Chunxiao; Zhang, Peng; Wang, Bin; Zhang, Yongdong

doi:10.24963/IJCAI.2020/151

Overcoming Language Priors with Self-Supervised Learning for Visual Question Answering

Xi Zhu, Zhendong Mao, Chunxiao Liu, Peng Zhang, Bin Wang, Yongdong Zhang

IJCAI 2020 pp. 1083-1089

doi:10.24963/IJCAI.2020/151 /ijcai/2020/zhu2020ijcai-overcoming/

Abstract

Most Visual Question Answering (VQA) models suffer from the language prior problem, which is caused by inherent data biases. Specifically, VQA models tend to answer questions (e.g., what color is the banana?) based on the high-frequency answers (e.g., yellow) ignoring image contents. Existing approaches tackle this problem by creating delicate models or introducing additional visual annotations to reduce question dependency and strengthen image dependency. However, they are still subject to the language prior problem since the data biases have not been fundamentally addressed. In this paper, we introduce a self-supervised learning framework to solve this problem. Concretely, we first automatically generate labeled data to balance the biased data, and then propose a self-supervised auxiliary task to utilize the balanced data to assist the VQA model to overcome language priors. Our method can compensate for the data biases by generating balanced data without introducing external annotations. Experimental results show that our method achieves state-of-the-art performance, improving the overall accuracy from 49.50% to 57.59% on the most commonly used benchmark VQA-CP v2. In other words, we can increase the performance of annotation-based methods by 16% without using external annotations. Our code is available on GitHub.

PDF IJCAI Semantic Scholar

Cite

Text

Zhu et al. "Overcoming Language Priors with Self-Supervised Learning for Visual Question Answering." International Joint Conference on Artificial Intelligence, 2020. doi:10.24963/IJCAI.2020/151

Markdown

[Zhu et al. "Overcoming Language Priors with Self-Supervised Learning for Visual Question Answering." International Joint Conference on Artificial Intelligence, 2020.](https://mlanthology.org/ijcai/2020/zhu2020ijcai-overcoming/) doi:10.24963/IJCAI.2020/151

BibTeX

@inproceedings{zhu2020ijcai-overcoming,
  title     = {{Overcoming Language Priors with Self-Supervised Learning for Visual Question Answering}},
  author    = {Zhu, Xi and Mao, Zhendong and Liu, Chunxiao and Zhang, Peng and Wang, Bin and Zhang, Yongdong},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2020},
  pages     = {1083-1089},
  doi       = {10.24963/IJCAI.2020/151},
  url       = {https://mlanthology.org/ijcai/2020/zhu2020ijcai-overcoming/}
}