Multi-to-Single: Reducing Multimodal Dependency in Emotion Recognition Through Contrastive Learning

Liu, Yan-Kai; Cai, Jinyu; Lu, Bao-Liang; Zheng, Wei-Long

doi:10.1609/AAAI.V39I2.32134

Multi-to-Single: Reducing Multimodal Dependency in Emotion Recognition Through Contrastive Learning

Yan-Kai Liu, Jinyu Cai, Bao-Liang Lu, Wei-Long Zheng

AAAI 2025 pp. 1438-1446

doi:10.1609/AAAI.V39I2.32134 /aaai/2025/liu2025aaai-multi-b/

Abstract

Multimodal emotion recognition is a crucial research area in the field of affective brain-computer interfaces. However, in practical applications, it is often challenging to obtain all modalities simultaneously. To deal with this problem, researchers focus on using cross-modal methods to learn multimodal representations with fewer modalities. However, due to the significant differences in the distribution of different modalities, it is challenging to enable any modality to fully learn multimodal features. To address this limitation, we propose a Multi-to-Single (M2S) emotion recognition model, leveraging contrastive learning and incorporating two innovative modules: 1) a spatial and temporal-sparse (STS) attention mechanism that enhances the encoders' ability to extract features from data; 2) a novel Multi-to-Multi Contrastive Predictive Coding (M2M CPC) that learns and fuses features across different modalities. In the final testing, we only use a single modality for emotion recognition, reducing the dependence on multimodal data. Extensive experiments on five public multimodal emotion datasets demonstrate that our model achieves the state-of-the-art performance in the cross-modal tasks and maintains multimodal performance using only a single modality.

PDF AAAI Semantic Scholar

Cite

Text

Liu et al. "Multi-to-Single: Reducing Multimodal Dependency in Emotion Recognition Through Contrastive Learning." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I2.32134

Markdown

[Liu et al. "Multi-to-Single: Reducing Multimodal Dependency in Emotion Recognition Through Contrastive Learning." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/liu2025aaai-multi-b/) doi:10.1609/AAAI.V39I2.32134

BibTeX

@inproceedings{liu2025aaai-multi-b,
  title     = {{Multi-to-Single: Reducing Multimodal Dependency in Emotion Recognition Through Contrastive Learning}},
  author    = {Liu, Yan-Kai and Cai, Jinyu and Lu, Bao-Liang and Zheng, Wei-Long},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {1438-1446},
  doi       = {10.1609/AAAI.V39I2.32134},
  url       = {https://mlanthology.org/aaai/2025/liu2025aaai-multi-b/}
}