Emotion-Llama: Multimodal Emotion Recognition and Reasoning with Instruction Tuning

Abstract

Accurate emotion perception is crucial for various applications, including human-computer interaction, education, and counseling.However, traditional single-modality approaches often fail to capture the complexity of real-world emotional expressions, which are inherently multimodal. Moreover, existing Multimodal Large Language Models (MLLMs) face challenges in integrating audio and recognizing subtle facial micro-expressions. To address this, we introduce the MERR dataset, containing 28,618 coarse-grained and 4,487 fine-grained annotated samples across diverse emotional categories. This dataset enables models to learn from varied scenarios and generalize to real-world applications. Furthermore, we propose Emotion-LLaMA, a model that seamlessly integrates audio, visual, and textual inputs through emotion-specific encoders. By aligning features into a shared space and employing a modified LLaMA model with instruction tuning, Emotion-LLaMA significantly enhances both emotional recognition and reasoning capabilities. Extensive evaluations show Emotion-LLaMA outperforms other MLLMs, achieving top scores in Clue Overlap (7.83) and Label Overlap (6.25) on EMER, an F1 score of 0.9036 on MER2023-SEMI challenge, and the highest UAR (45.59) and WAR (59.37) in zero-shot evaluations on DFEW dataset.

Cite

Text

Cheng et al. "Emotion-Llama: Multimodal Emotion Recognition and Reasoning with Instruction Tuning." Neural Information Processing Systems, 2024. doi:10.52202/079017-3518

Markdown

[Cheng et al. "Emotion-Llama: Multimodal Emotion Recognition and Reasoning with Instruction Tuning." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/cheng2024neurips-emotionllama/) doi:10.52202/079017-3518

BibTeX

@inproceedings{cheng2024neurips-emotionllama,
  title     = {{Emotion-Llama: Multimodal Emotion Recognition and Reasoning with Instruction Tuning}},
  author    = {Cheng, Zebang and Cheng, Zhi-Qi and He, Jun-Yan and Sun, Jingdong and Wang, Kai and Lin, Yuxiang and Lian, Zheng and Peng, Xiaojiang and Hauptmann, Alexander G.},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-3518},
  url       = {https://mlanthology.org/neurips/2024/cheng2024neurips-emotionllama/}
}