MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models

Hua, Hang; Zeng, Ziyun; Song, Yizhi; Tang, Yunlong; He, Liu; Aliaga, Daniel; Xiong, Wei; Luo, Jiebo

MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models

Hang Hua, Ziyun Zeng, Yizhi Song, Yunlong Tang, Liu He, Daniel Aliaga, Wei Xiong, Jiebo Luo

NeurIPS 2025

/neurips/2025/hua2025neurips-mmigbench/

Abstract

Recent multimodal image generators such as GPT-4o, Gemini 2.0 Flash, and Gemini 2.5 Pro excel at following complex instructions, editing images and maintaining concept consistency. However, they are still evaluated by disjoint toolkits: text-to-image (T2I) benchmarks that lacks multi-modal conditioning, and customized image generation benchmarks that overlook compositional semantics and common knowledge. We propose **MMIG-Bench**, a comprehensive **M**ulti-**M**odal **I**mage **G**eneration **Bench**mark that unifies these tasks by pairing 4,850 richly annotated text prompts with 1,750 multi-view reference images across 380 subjects, spanning humans, animals, objects, and artistic styles. **MMIG-Bench** is equipped with a three-level evaluation framework: (1) low-level metrics for visual artifacts and identity preservation of objects; (2) novel Aspect Matching Score (AMS): a VQA-based mid-level metric that delivers fine-grained prompt-image alignment and shows strong correlation with human judgments; and (3) high-level metrics for aesthetics and human preference. Using **MMIG-Bench**, we benchmark 17 state-of-the-art models, including Gemini 2.5 Pro, FLUX, DreamBooth, and IP-Adapter, and validate our metrics with 32k human ratings, yielding in-depth insights into architecture and data design.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Hua et al. "MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models." Advances in Neural Information Processing Systems, 2025.

Markdown

[Hua et al. "MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/hua2025neurips-mmigbench/)

BibTeX

@inproceedings{hua2025neurips-mmigbench,
  title     = {{MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models}},
  author    = {Hua, Hang and Zeng, Ziyun and Song, Yizhi and Tang, Yunlong and He, Liu and Aliaga, Daniel and Xiong, Wei and Luo, Jiebo},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/hua2025neurips-mmigbench/}
}