Text-Guided Dual Interaction for Multimodal Relation Extraction in Social Media

Zhang, Yachuan; Guo, Yi

doi:10.1007/978-3-032-06109-6_26

Text-Guided Dual Interaction for Multimodal Relation Extraction in Social Media

Yachuan Zhang, Yi Guo

ECML-PKDD 2025 pp. 454-469

doi:10.1007/978-3-032-06109-6_26 /ecmlpkdd/2025/zhang2025ecmlpkdd-textguided/

Abstract

Multimodal relation extraction is essential for information extraction and knowledge graph construction. In social media, in some situations, text and images often lack relevance or have weak connections, which can mislead models. While many current approaches focus on modality alignment and fusion, they overlook the role of domain-specific modality in mitigating information bias. Moreover, significant gaps between modalities make it challenging to establish deep associative relationships. To tackle these challenges, we propose the Text-Guided Dual Interaction (TGDI) model, which incorporates a Modal Dual-Interaction mechanism. Specifically, the Cross-Modal Interaction module performs global level fusion to achieve initial alignment, while the Text-Oriented Interaction module refines this integration by preserving essential visual information under textual guidance. Additionally, the Text Modulated Matching Gate regulates visual contributions and evaluates image-text similarity to minimize visual noise. Finally, the fusion function adapts to various text-image scenarios, ensuring effective relation extraction. Extensive experiments on the Twitter dataset demonstrate that TGDI not only surpasses state-of-the-art baselines but also robustly suppresses the influence of irrelevant visual content in real-world multimodal settings.

PDF ECML-PKDD Semantic Scholar

Cite

Text

Zhang and Guo. "Text-Guided Dual Interaction for Multimodal Relation Extraction in Social Media." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025. doi:10.1007/978-3-032-06109-6_26

Markdown

[Zhang and Guo. "Text-Guided Dual Interaction for Multimodal Relation Extraction in Social Media." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025.](https://mlanthology.org/ecmlpkdd/2025/zhang2025ecmlpkdd-textguided/) doi:10.1007/978-3-032-06109-6_26

BibTeX

@inproceedings{zhang2025ecmlpkdd-textguided,
  title     = {{Text-Guided Dual Interaction for Multimodal Relation Extraction in Social Media}},
  author    = {Zhang, Yachuan and Guo, Yi},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2025},
  pages     = {454-469},
  doi       = {10.1007/978-3-032-06109-6_26},
  url       = {https://mlanthology.org/ecmlpkdd/2025/zhang2025ecmlpkdd-textguided/}
}