Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog

Abstract

Visual Dialog requires an agent to engage in a conversation with humans grounded in an image. Many studies on Visual Dialog focus on the understanding of the dialog history or the content of an image, while a considerable amount of commonsense-required questions are ignored. Handling these scenarios depends on logical reasoning that requires commonsense priors. How to capture relevant commonsense knowledge complementary to the history and the image remains a key challenge. In this paper, we propose a novel model by Reasoning with Multi-structure Commonsense Knowledge (RMK). In our model, the external knowledge is represented with sentence-level facts and graph-level facts, to properly suit the scenario of the composite of dialog history and image. On top of these multi-structure representations, our model can capture relevant knowledge and incorporate them into the vision and semantic features, via graph-based interaction and transformer-based fusion. Experimental results and analysis on VisDial v1.0 and VisDialCK datasets show that our proposed model effectively outperforms comparative methods.

Cite

Text

Zhang et al. "Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022. doi:10.1109/CVPRW56347.2022.00506

Markdown

[Zhang et al. "Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022.](https://mlanthology.org/cvprw/2022/zhang2022cvprw-reasoning/) doi:10.1109/CVPRW56347.2022.00506

BibTeX

@inproceedings{zhang2022cvprw-reasoning,
  title     = {{Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog}},
  author    = {Zhang, Shunyu and Jiang, Xiaoze and Yang, Zequn and Wan, Tao and Qin, Zengchang},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2022},
  pages     = {4599-4608},
  doi       = {10.1109/CVPRW56347.2022.00506},
  url       = {https://mlanthology.org/cvprw/2022/zhang2022cvprw-reasoning/}
}