MME-Emotion: A Holistic Evaluation Benchmark for Emotional Intelligence in Multimodal Large Language Models

Zhang, Fan; Cheng, Zebang; Deng, Chong; Li, Haoxuan; Lian, Zheng; Chen, Qian; Liu, Huadai; Wang, Wen; Zhang, YiFan; Zhang, Renrui; Guo, Ziyu; Zhu, Zhihong; Wu, Hao; Wang, Haixin; Zheng, Yefeng; Peng, Xiaojiang; Wu, Xian; Wang, Kun; Li, Xiangang; Ye, Jieping; Heng, Pheng-Ann

MME-Emotion: A Holistic Evaluation Benchmark for Emotional Intelligence in Multimodal Large Language Models

ICLR 2026

/iclr/2026/zhang2026iclr-mmeemotion/

Abstract

Recent advances in multimodal large language models (MLLMs) have catalyzed transformative progress in affective computing, enabling models to exhibit emergent emotional intelligence. Despite substantial methodological progress, current emotional benchmarks remain limited, as it is still unknown: (a) the generalization abilities of MLLMs across distinct scenarios, and (b) their reasoning capabilities to identify the triggering factors behind emotional states. To bridge these gaps, we present MME-Emotion, a systematic benchmark that assesses both emotional understanding and reasoning capabilities of MLLMs, enjoying scalable capacity, diverse settings, and unified protocols. As the largest emotional intelligence benchmark for MLLMs, MME-Emotion contains over 6,000 curated video clips with task-specific questioning-answering (QA) pairs, spanning broad scenarios to formulate eight emotional tasks. It further incorporates a holistic evaluation suite with hybrid metrics for emotion recognition and reasoning, analyzed through a multi-agent system framework. Through a rigorous evaluation of 20 advanced MLLMs, we uncover both their strengths and limitations, yielding several key insights: (1) Current MLLMs exhibit unsatisfactory emotional intelligence, with the best-performing model achieving only $39.3\%$ recognition score and $56.0\%$ Chain-of-Thought (CoT) score on our benchmark. (2) Generalist models (\emph{e.g.}, Gemini-2.5-Pro) derive emotional intelligence from generalized multimodal understanding capabilities, while specialist models (\emph{e.g.}, R1-Omni) can achieve comparable performance through domain-specific post-training adaptation. By introducing MME-Emotion, we hope that it can serve as a foundation for advancing MLLMs' emotional intelligence in the future.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Zhang et al. "MME-Emotion: A Holistic Evaluation Benchmark for Emotional Intelligence in Multimodal Large Language Models." International Conference on Learning Representations, 2026.

Markdown

[Zhang et al. "MME-Emotion: A Holistic Evaluation Benchmark for Emotional Intelligence in Multimodal Large Language Models." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zhang2026iclr-mmeemotion/)

BibTeX

@inproceedings{zhang2026iclr-mmeemotion,
  title     = {{MME-Emotion: A Holistic Evaluation Benchmark for Emotional Intelligence in Multimodal Large Language Models}},
  author    = {Zhang, Fan and Cheng, Zebang and Deng, Chong and Li, Haoxuan and Lian, Zheng and Chen, Qian and Liu, Huadai and Wang, Wen and Zhang, YiFan and Zhang, Renrui and Guo, Ziyu and Zhu, Zhihong and Wu, Hao and Wang, Haixin and Zheng, Yefeng and Peng, Xiaojiang and Wu, Xian and Wang, Kun and Li, Xiangang and Ye, Jieping and Heng, Pheng-Ann},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/zhang2026iclr-mmeemotion/}
}