Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog

Shunyu Zhang, Xiaoze Jiang, Zequn Yang, Tao Wan, Zengchang Qin

CVPRW 2022 pp. 4599-4608

doi:10.1109/CVPRW56347.2022.00506 /cvprw/2022/zhang2022cvprw-reasoning/

Abstract

Visual Dialog requires an agent to engage in a conversation with humans grounded in an image. Many studies on Visual Dialog focus on the understanding of the dialog history or the content of an image, while a considerable amount of commonsense-required questions are ignored. Handling these scenarios depends on logical reasoning that requires commonsense priors. How to capture relevant commonsense knowledge complementary to the history and the image remains a key challenge. In this paper, we propose a novel model by Reasoning with Multi-structure Commonsense Knowledge (RMK). In our model, the external knowledge is represented with sentence-level facts and graph-level facts, to properly suit the scenario of the composite of dialog history and image. On top of these multi-structure representations, our model can capture relevant knowledge and incorporate them into the vision and semantic features, via graph-based interaction and transformer-based fusion. Experimental results and analysis on VisDial v1.0 and VisDialCK datasets show that our proposed model effectively outperforms comparative methods.

CVPRW Semantic Scholar

Cite

Text

Zhang et al. "Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022. doi:10.1109/CVPRW56347.2022.00506

Markdown

[Zhang et al. "Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022.](https://mlanthology.org/cvprw/2022/zhang2022cvprw-reasoning/) doi:10.1109/CVPRW56347.2022.00506

BibTeX

@inproceedings{zhang2022cvprw-reasoning,
  title     = {{Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog}},
  author    = {Zhang, Shunyu and Jiang, Xiaoze and Yang, Zequn and Wan, Tao and Qin, Zengchang},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2022},
  pages     = {4599-4608},
  doi       = {10.1109/CVPRW56347.2022.00506},
  url       = {https://mlanthology.org/cvprw/2022/zhang2022cvprw-reasoning/}
}