Scene Graph Contextualization in Visual Commonsense Reasoning

Abstract

Leveraging structured visual representations such as scene graphs is beneficial to high-level computer vision tasks such as captioning or visual question answering. The recent Visual Commonsense Reasoning challenge focuses on the cognition aspects of question answering. The paper also introduces a Reasoning to Cognition (R2C) architecture tailored for this problem. We propose a modification to the R2C network that takes into account the knowledge encoded in the image by extracting the scene graph, embedding the facts and attending to them.

Cite

Text

Brad. "Scene Graph Contextualization in Visual Commonsense Reasoning." IEEE/CVF International Conference on Computer Vision Workshops, 2019. doi:10.1109/ICCVW.2019.00560

Markdown

[Brad. "Scene Graph Contextualization in Visual Commonsense Reasoning." IEEE/CVF International Conference on Computer Vision Workshops, 2019.](https://mlanthology.org/iccvw/2019/brad2019iccvw-scene/) doi:10.1109/ICCVW.2019.00560

BibTeX

@inproceedings{brad2019iccvw-scene,
  title     = {{Scene Graph Contextualization in Visual Commonsense Reasoning}},
  author    = {Brad, Florin},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2019},
  pages     = {4584-4586},
  doi       = {10.1109/ICCVW.2019.00560},
  url       = {https://mlanthology.org/iccvw/2019/brad2019iccvw-scene/}
}