Text-Guided Dual Interaction for Multimodal Relation Extraction in Social Media
Abstract
Multimodal relation extraction is essential for information extraction and knowledge graph construction. In social media, in some situations, text and images often lack relevance or have weak connections, which can mislead models. While many current approaches focus on modality alignment and fusion, they overlook the role of domain-specific modality in mitigating information bias. Moreover, significant gaps between modalities make it challenging to establish deep associative relationships. To tackle these challenges, we propose the Text-Guided Dual Interaction (TGDI) model, which incorporates a Modal Dual-Interaction mechanism. Specifically, the Cross-Modal Interaction module performs global level fusion to achieve initial alignment, while the Text-Oriented Interaction module refines this integration by preserving essential visual information under textual guidance. Additionally, the Text Modulated Matching Gate regulates visual contributions and evaluates image-text similarity to minimize visual noise. Finally, the fusion function adapts to various text-image scenarios, ensuring effective relation extraction. Extensive experiments on the Twitter dataset demonstrate that TGDI not only surpasses state-of-the-art baselines but also robustly suppresses the influence of irrelevant visual content in real-world multimodal settings.
Cite
Text
Zhang and Guo. "Text-Guided Dual Interaction for Multimodal Relation Extraction in Social Media." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025. doi:10.1007/978-3-032-06109-6_26Markdown
[Zhang and Guo. "Text-Guided Dual Interaction for Multimodal Relation Extraction in Social Media." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025.](https://mlanthology.org/ecmlpkdd/2025/zhang2025ecmlpkdd-textguided/) doi:10.1007/978-3-032-06109-6_26BibTeX
@inproceedings{zhang2025ecmlpkdd-textguided,
title = {{Text-Guided Dual Interaction for Multimodal Relation Extraction in Social Media}},
author = {Zhang, Yachuan and Guo, Yi},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2025},
pages = {454-469},
doi = {10.1007/978-3-032-06109-6_26},
url = {https://mlanthology.org/ecmlpkdd/2025/zhang2025ecmlpkdd-textguided/}
}