Integrating Vision-Language Semantic Graphs in Multi-View Clustering

Abstract

Multimodal emotion recognition has garnered significant attention for its ability to integrate data from multiple modalities to enhance performance. However, physiological signals like electroencephalogram are more challenging to acquire than visual data due to higher collection costs and complexity. This limits the practical application of multimodal networks. To address this issue, this paper proposes a cross-modal knowledge distillation framework for emotion recognition. The framework aims to leverage the strengths of a multimodal teacher network to enhance the performance of a unimodal student network using only the visual modality as input. Specifically, we design a prototype-based modality rebalancing strategy, which dynamically adjusts the convergence rates of different modalities to mitigate modality imbalance issue. It enables the teacher network to better integrate multimodal information. Building upon this, we develop a Cross-Modal Densely Guided Knowledge Distillation (CDGKD) method, which effectively transfers knowledge extracted by the multimodal teacher network to the unimodal student network. Our CDGKD uses multi-level teacher assistant networks to bridge the teacher-student gap and employs dense guidance to reduce error accumulation during knowledge transfer. Experimental results demonstrate that the proposed framework outperforms existing methods on two public emotion datasets, providing an effective solution for emotion recognition in modality-constrained scenarios.

Cite

Text

Ke et al. "Integrating Vision-Language Semantic Graphs in Multi-View Clustering." International Joint Conference on Artificial Intelligence, 2024. doi:10.24963/ijcai.2024/472

Markdown

[Ke et al. "Integrating Vision-Language Semantic Graphs in Multi-View Clustering." International Joint Conference on Artificial Intelligence, 2024.](https://mlanthology.org/ijcai/2024/ke2024ijcai-integrating/) doi:10.24963/ijcai.2024/472

BibTeX

@inproceedings{ke2024ijcai-integrating,
  title     = {{Integrating Vision-Language Semantic Graphs in Multi-View Clustering}},
  author    = {Ke, Junlong and Wen, Zichen and Yang, Yechenhao and Cui, Chenhang and Ren, Yazhou and Pu, Xiaorong and He, Lifang},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {4273-4281},
  doi       = {10.24963/ijcai.2024/472},
  url       = {https://mlanthology.org/ijcai/2024/ke2024ijcai-integrating/}
}