M4Bench: A Benchmark of Multi-Domain Multi-Granularity Multi-Image Understanding for Multi-Modal Large Language Models

Ye, Xiaojun; Liang, Guanbao; Wang, Chun; Li, Liangcheng; Ke, Pengfei; Wang, Rui; Jia, Bingxin; Huang, Gang; Sun, Qiao; Zhou, Sheng

doi:10.24963/IJCAI.2025/762

M4Bench: A Benchmark of Multi-Domain Multi-Granularity Multi-Image Understanding for Multi-Modal Large Language Models

Xiaojun Ye, Guanbao Liang, Chun Wang, Liangcheng Li, Pengfei Ke, Rui Wang, Bingxin Jia, Gang Huang, Qiao Sun, Sheng Zhou

IJCAI 2025 pp. 6848-6856

doi:10.24963/IJCAI.2025/762 /ijcai/2025/ye2025ijcai-m/

Abstract

The increasing demands in analyzing complex associated scenes pose necessities to researching multi-image understanding abilities. Compared with understanding individual images, both the alignments and differences between images are essential aspects of understanding the intricate relationships for multi-image inference tasks. However, existing benchmarks face difficulties in addressing both of these aspects simultaneously, resulting in obstacles to modeling relationships under various granularities and domains of images. In this paper, we introduce M4Bench to enhance the capability of aligning and distinguishing multi-images with multi-domain multi-granularity comparison. We carefully design five comparison tasks related to coarse and fine-grained granularities in single and multiple domains of images and evaluate them on 13 state-of-the-art multi-modal large language models with various sizes. Besides, we analyze the evaluation results and provide several observations and viewpoints for the multi-image understanding research. The data and evaluation code are available at https://github.com/eaglelab-zju/M4Bench.

PDF IJCAI Semantic Scholar

Cite

Text

Ye et al. "M4Bench: A Benchmark of Multi-Domain Multi-Granularity Multi-Image Understanding for Multi-Modal Large Language Models." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/762

Markdown

[Ye et al. "M4Bench: A Benchmark of Multi-Domain Multi-Granularity Multi-Image Understanding for Multi-Modal Large Language Models." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/ye2025ijcai-m/) doi:10.24963/IJCAI.2025/762

BibTeX

@inproceedings{ye2025ijcai-m,
  title     = {{M4Bench: A Benchmark of Multi-Domain Multi-Granularity Multi-Image Understanding for Multi-Modal Large Language Models}},
  author    = {Ye, Xiaojun and Liang, Guanbao and Wang, Chun and Li, Liangcheng and Ke, Pengfei and Wang, Rui and Jia, Bingxin and Huang, Gang and Sun, Qiao and Zhou, Sheng},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {6848-6856},
  doi       = {10.24963/IJCAI.2025/762},
  url       = {https://mlanthology.org/ijcai/2025/ye2025ijcai-m/}
}