M4Bench: A Benchmark of Multi-Domain Multi-Granularity Multi-Image Understanding for Multi-Modal Large Language Models
Abstract
The increasing demands in analyzing complex associated scenes pose necessities to researching multi-image understanding abilities. Compared with understanding individual images, both the alignments and differences between images are essential aspects of understanding the intricate relationships for multi-image inference tasks. However, existing benchmarks face difficulties in addressing both of these aspects simultaneously, resulting in obstacles to modeling relationships under various granularities and domains of images. In this paper, we introduce M4Bench to enhance the capability of aligning and distinguishing multi-images with multi-domain multi-granularity comparison. We carefully design five comparison tasks related to coarse and fine-grained granularities in single and multiple domains of images and evaluate them on 13 state-of-the-art multi-modal large language models with various sizes. Besides, we analyze the evaluation results and provide several observations and viewpoints for the multi-image understanding research. The data and evaluation code are available at https://github.com/eaglelab-zju/M4Bench.
Cite
Text
Ye et al. "M4Bench: A Benchmark of Multi-Domain Multi-Granularity Multi-Image Understanding for Multi-Modal Large Language Models." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/762Markdown
[Ye et al. "M4Bench: A Benchmark of Multi-Domain Multi-Granularity Multi-Image Understanding for Multi-Modal Large Language Models." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/ye2025ijcai-m/) doi:10.24963/IJCAI.2025/762BibTeX
@inproceedings{ye2025ijcai-m,
title = {{M4Bench: A Benchmark of Multi-Domain Multi-Granularity Multi-Image Understanding for Multi-Modal Large Language Models}},
author = {Ye, Xiaojun and Liang, Guanbao and Wang, Chun and Li, Liangcheng and Ke, Pengfei and Wang, Rui and Jia, Bingxin and Huang, Gang and Sun, Qiao and Zhou, Sheng},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2025},
pages = {6848-6856},
doi = {10.24963/IJCAI.2025/762},
url = {https://mlanthology.org/ijcai/2025/ye2025ijcai-m/}
}