Scene Graph Contextualization in Visual Commonsense Reasoning
Abstract
Leveraging structured visual representations such as scene graphs is beneficial to high-level computer vision tasks such as captioning or visual question answering. The recent Visual Commonsense Reasoning challenge focuses on the cognition aspects of question answering. The paper also introduces a Reasoning to Cognition (R2C) architecture tailored for this problem. We propose a modification to the R2C network that takes into account the knowledge encoded in the image by extracting the scene graph, embedding the facts and attending to them.
Cite
Text
Brad. "Scene Graph Contextualization in Visual Commonsense Reasoning." IEEE/CVF International Conference on Computer Vision Workshops, 2019. doi:10.1109/ICCVW.2019.00560Markdown
[Brad. "Scene Graph Contextualization in Visual Commonsense Reasoning." IEEE/CVF International Conference on Computer Vision Workshops, 2019.](https://mlanthology.org/iccvw/2019/brad2019iccvw-scene/) doi:10.1109/ICCVW.2019.00560BibTeX
@inproceedings{brad2019iccvw-scene,
title = {{Scene Graph Contextualization in Visual Commonsense Reasoning}},
author = {Brad, Florin},
booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
year = {2019},
pages = {4584-4586},
doi = {10.1109/ICCVW.2019.00560},
url = {https://mlanthology.org/iccvw/2019/brad2019iccvw-scene/}
}