Semantically Guided Visual Question Answering

Zhao, Handong; Fan, Quanfu; Gutfreund, Dan; Fu, Yun

doi:10.1109/WACV.2018.00205

Semantically Guided Visual Question Answering

Handong Zhao, Quanfu Fan, Dan Gutfreund, Yun Fu

WACV 2018 pp. 1852-1860

doi:10.1109/WACV.2018.00205 /wacv/2018/zhao2018wacv-semantically/

Abstract

We present a novel approach to enhance the challenging task of Visual Question Answering (VQA) by incorporating and enriching semantic knowledge in a VQA model. We first apply Multiple Instance Learning (MIL) to extract a richer visual representation addressing concepts beyond objects such as actions and colors. Motivated by the observation that semantically related answers often appear together in prediction, we further develop a new semantically-guided loss function for model learning which has the potential to drive weakly-scored but correct answers to the top while suppressing wrong answers. We show that these two ideas contribute to performance improvement in a complementary way. We demonstrate competitive results comparable to the state of the art on two VQA benchmark datasets.

WACV Semantic Scholar

Cite

Text

Zhao et al. "Semantically Guided Visual Question Answering." IEEE/CVF Winter Conference on Applications of Computer Vision, 2018. doi:10.1109/WACV.2018.00205

Markdown

[Zhao et al. "Semantically Guided Visual Question Answering." IEEE/CVF Winter Conference on Applications of Computer Vision, 2018.](https://mlanthology.org/wacv/2018/zhao2018wacv-semantically/) doi:10.1109/WACV.2018.00205

BibTeX

@inproceedings{zhao2018wacv-semantically,
  title     = {{Semantically Guided Visual Question Answering}},
  author    = {Zhao, Handong and Fan, Quanfu and Gutfreund, Dan and Fu, Yun},
  booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision},
  year      = {2018},
  pages     = {1852-1860},
  doi       = {10.1109/WACV.2018.00205},
  url       = {https://mlanthology.org/wacv/2018/zhao2018wacv-semantically/}
}