Understanding Knowledge Gaps in Visual Question Answering: Implications for Gap Identification and Testing

Abstract

Traditional Visual Question Answering (VQA) datasets typically contain questions related to the spatial information of objects, object attributes, or general scene questions. Recently, researchers have recognized the need to improve the balance of such datasets to reduce the system’s dependency on memorized linguistic features and statistical biases, while aiming for enhanced visual understanding. However, it is unclear whether any latent patterns exist to quantify and explain these failures. As an initial step towards better quantifying our understanding of the performance of VQA models, we use a taxonomy of Knowledge Gaps (KGs) to tag questions with one or more types of KGs. Each KG describes the reasoning abilities needed to arrive at a resolution, and failure to resolve gaps indicates an absence of the required reasoning ability. After identifying KGs for each question, we examine the skew in the distribution of questions for each KG. We then introduce a targeted question generation model to reduce this skew, which allows us to generate new types of questions for an image.

Cite

Text

Bajaj et al. "Understanding Knowledge Gaps in Visual Question Answering: Implications for Gap Identification and Testing." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020. doi:10.1109/CVPRW50498.2020.00201

Markdown

[Bajaj et al. "Understanding Knowledge Gaps in Visual Question Answering: Implications for Gap Identification and Testing." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020.](https://mlanthology.org/cvprw/2020/bajaj2020cvprw-understanding/) doi:10.1109/CVPRW50498.2020.00201

BibTeX

@inproceedings{bajaj2020cvprw-understanding,
  title     = {{Understanding Knowledge Gaps in Visual Question Answering: Implications for Gap Identification and Testing}},
  author    = {Bajaj, Goonmeet and Bandyopadhyay, Bortik and Schmidt, Daniel and Maneriker, Pranav and Myers, Christopher and Parthasarathy, Srinivasan},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2020},
  pages     = {1563-1566},
  doi       = {10.1109/CVPRW50498.2020.00201},
  url       = {https://mlanthology.org/cvprw/2020/bajaj2020cvprw-understanding/}
}