Object-Based Reasoning in VQA

Abstract

Visual Question Answering (VQA) is a novel problem domain where multi-modal inputs must be processed in order to solve the task given in the form of a natural language. As the solutions inherently require to combine visual and natural language processing with abstract reasoning, the problem is considered as AI-complete. Recent advances indicate that using high-level, abstract facts extracted from the inputs might facilitate reasoning. Following that direction we decided to develop a solution combining state-of-the-art object detection and reasoning modules. The results, achieved on the well-balanced CLEVR dataset, confirm the promises and show significant, few percent improvements of accuracy on the complex "counting" task.

Cite

Text

Desta et al. "Object-Based Reasoning in VQA." IEEE/CVF Winter Conference on Applications of Computer Vision, 2018. doi:10.1109/WACV.2018.00201

Markdown

[Desta et al. "Object-Based Reasoning in VQA." IEEE/CVF Winter Conference on Applications of Computer Vision, 2018.](https://mlanthology.org/wacv/2018/desta2018wacv-object/) doi:10.1109/WACV.2018.00201

BibTeX

@inproceedings{desta2018wacv-object,
  title     = {{Object-Based Reasoning in VQA}},
  author    = {Desta, Mikyas T. and Chen, Larry and Kornuta, Tomasz},
  booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision},
  year      = {2018},
  pages     = {1814-1823},
  doi       = {10.1109/WACV.2018.00201},
  url       = {https://mlanthology.org/wacv/2018/desta2018wacv-object/}
}