The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions

Peng Wang, Qi Wu, Chunhua Shen, Anton van den Hengel

CVPR 2017

doi:10.1109/CVPR.2017.416 /cvpr/2017/wang2017cvpr-vqamachine/

Abstract

One of the most intriguing features of the Visual Question Answering (VQA) challenge is the unpredictability of the questions. Extracting the information required to answer them demands a variety of image operations from detection and counting, to segmentation and reconstruction. To train a method to perform even one of these operations accurately from image,question,answer tuples would be challenging, but to aim to achieve them all with a limited set of such training data seems ambitious at best. Our method thus learns how to exploit a set of external off-the-shelf algorithms to achieve its goal, an approach that has something in common with the Neural Turing Machine. The core of our proposed method is a new co-attention model. In addition, the proposed approach generates human-readable reasons for its decision, and can still be trained end-to-end without ground truth reasons being given. We demonstrate the effectiveness on two publicly available datasets, Visual Genome and VQA, and show that it produces the state-of-the-art results in both cases.

PDF CVPR Semantic Scholar

Cite

Text

Wang et al. "The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions." Conference on Computer Vision and Pattern Recognition, 2017. doi:10.1109/CVPR.2017.416

Markdown

[Wang et al. "The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions." Conference on Computer Vision and Pattern Recognition, 2017.](https://mlanthology.org/cvpr/2017/wang2017cvpr-vqamachine/) doi:10.1109/CVPR.2017.416

BibTeX

@inproceedings{wang2017cvpr-vqamachine,
  title     = {{The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions}},
  author    = {Wang, Peng and Wu, Qi and Shen, Chunhua and van den Hengel, Anton},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2017},
  doi       = {10.1109/CVPR.2017.416},
  url       = {https://mlanthology.org/cvpr/2017/wang2017cvpr-vqamachine/}
}