MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
Abstract
Unified Multimodal Large Language Models (U-MLLMs) have garnered considerable interest for their ability to seamlessly integrate generation and comprehension tasks. However, existing research lacks a unified evaluation standard, often relying on isolated benchmarks to assess these capabilities. Moreover, current work highlights the potential of “mixed-modality generation capabilities” through case studies—such as generating auxiliary lines in images to solve geometric problems, or reasoning through a problem before generating a corresponding image. Despite this, there is no standardized benchmark to assess models on such unified tasks. To address this gap, we introduce MME-Unify, also termed as MME-U, the first open and reproducible benchmark designed to evaluate multimodal comprehension, generation, and mixed-modality generation capabilities. For comprehension and generation tasks, we curate a diverse set of tasks from 12 datasets, aligning their formats and metrics to develop a standardized evaluation framework. For unified tasks, we design five subtasks to rigorously assess how models’ understanding and generation capabilities can mutually enhance each other. Evaluation of 17 U-MLLMs, including Janus-Pro, Bagel, and Gemini2-Flash, reveals significant room for improvement, particularly in areas such as instruction following and image generation quality.
Cite
Text
Xie et al. "MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models." International Conference on Learning Representations, 2026.Markdown
[Xie et al. "MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/xie2026iclr-mmeunify/)BibTeX
@inproceedings{xie2026iclr-mmeunify,
title = {{MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models}},
author = {Xie, Wulin and Zhang, YiFan and Fu, Chaoyou and Shi, Yang and Zeng, Jianshu and Nie, Bingyan and Chen, Hongkai and Zhang, Zhang and Wang, Liang},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/xie2026iclr-mmeunify/}
}