WDMIR: Wavelet-Driven Multimodal Intent Recognition

Gong, Weiyin; Zhang, Kai; Zhang, Yanghai; Liu, Qi; Sun, Xinjie; Lu, Junyu; Zhu, Linbo

doi:10.24963/IJCAI.2025/582

WDMIR: Wavelet-Driven Multimodal Intent Recognition

Weiyin Gong, Kai Zhang, Yanghai Zhang, Qi Liu, Xinjie Sun, Junyu Lu, Linbo Zhu

IJCAI 2025 pp. 5226-5234

doi:10.24963/IJCAI.2025/582 /ijcai/2025/gong2025ijcai-wdmir/

Abstract

Multimodal intent recognition (MIR) seeks to accurately interpret user intentions by integrating verbal and non-verbal information across video, audio and text modalities. While existing approaches prioritize text analysis, they often overlook the rich semantic content embedded in non-verbal cues. This paper presents a novel Wavelet-Driven Multimodal Intent Recognition (WDMIR) framework that enhances intent understanding through frequency-domain analysis of non-verbal information. To be more specific, we propose: (1) a wavelet-driven fusion module that performs synchronized decomposition and integration of video-audio features in the frequency domain, enabling fine-grained analysis of temporal dynamics; (2) a cross-modal interaction mechanism that facilitates progressive feature enhancement from bimodal to trimodal integration, effectively bridging the semantic gap between verbal and non-verbal information. Extensive experiments on MIntRec demonstrate that our approach achieves state-of-the-art performance, surpassing previous methods by 1.13% on accuracy. Ablation studies further verify that the wavelet-driven fusion module significantly improves the extraction of semantic information from non-verbal sources, with a 0.41% increase in recognition accuracy when analyzing subtle emotional cues.

PDF IJCAI Semantic Scholar

Cite

Text

Gong et al. "WDMIR: Wavelet-Driven Multimodal Intent Recognition." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/582

Markdown

[Gong et al. "WDMIR: Wavelet-Driven Multimodal Intent Recognition." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/gong2025ijcai-wdmir/) doi:10.24963/IJCAI.2025/582

BibTeX

@inproceedings{gong2025ijcai-wdmir,
  title     = {{WDMIR: Wavelet-Driven Multimodal Intent Recognition}},
  author    = {Gong, Weiyin and Zhang, Kai and Zhang, Yanghai and Liu, Qi and Sun, Xinjie and Lu, Junyu and Zhu, Linbo},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {5226-5234},
  doi       = {10.24963/IJCAI.2025/582},
  url       = {https://mlanthology.org/ijcai/2025/gong2025ijcai-wdmir/}
}