Knowledge-Enhanced Scene Graph Generation with Multimodal Relation Alignment (Student Abstract)

Abstract

Existing scene graph generation methods suffer the limitations when the image lacks of sufficient visual contexts. To address this limitation, we propose a knowledge-enhanced scene graph generation model with multimodal relation alignment, which supplements the missing visual contexts by well-aligned textual knowledge. First, we represent the textual information into contextualized knowledge which is guided by the visual objects to enhance the contexts. Furthermore, we align the multimodal relation triplets by co-attention module for better semantics fusion. The experimental results show the effectiveness of our method.

Cite

Text

Fu et al. "Knowledge-Enhanced Scene Graph Generation with Multimodal Relation Alignment (Student Abstract)." AAAI Conference on Artificial Intelligence, 2022. doi:10.1609/AAAI.V36I11.21610

Markdown

[Fu et al. "Knowledge-Enhanced Scene Graph Generation with Multimodal Relation Alignment (Student Abstract)." AAAI Conference on Artificial Intelligence, 2022.](https://mlanthology.org/aaai/2022/fu2022aaai-knowledge/) doi:10.1609/AAAI.V36I11.21610

BibTeX

@inproceedings{fu2022aaai-knowledge,
  title     = {{Knowledge-Enhanced Scene Graph Generation with Multimodal Relation Alignment (Student Abstract)}},
  author    = {Fu, Ze and Feng, Junhao and Zheng, Changmeng and Cai, Yi},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2022},
  pages     = {12947-12948},
  doi       = {10.1609/AAAI.V36I11.21610},
  url       = {https://mlanthology.org/aaai/2022/fu2022aaai-knowledge/}
}