PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology

Sun, Yuxuan; Wu, Hao; Zhu, Chenglu; Zheng, Sunyi; Chen, Qizi; Zhang, Kai; Zhang, Yunlong; Wan, Dan; Lan, Xiaoxiao; Zheng, Mengyue; Li, Jingxiong; Lyu, Xinheng; Lin, Tao; Yang, Lin

doi:10.1007/978-3-031-73033-7_4

PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology

Yuxuan Sun, Hao Wu, Chenglu Zhu, Sunyi Zheng, Qizi Chen, Kai Zhang, Yunlong Zhang, Dan Wan, Xiaoxiao Lan, Mengyue Zheng, Jingxiong Li, Xinheng Lyu, Tao Lin, Lin Yang

ECCV 2024

doi:10.1007/978-3-031-73033-7_4 /eccv/2024/sun2024eccv-pathmmu/

Abstract

The emergence of Large Multimodal Models (LMMs) has unlocked remarkable potential in AI, particularly in pathology. However, the lack of specialized, high-quality benchmark impeded their development and precise evaluation. To address this, we introduce PathMMU, the largest and highest-quality expert validated pathology benchmark for LMMs. It comprises 33,428 multimodal multi-choice questions and 24,067 images from various sources, each accompanied by an explanation for the correct answer. The construction of PathMMU leverages GPT-4V’s advanced capabilities, utilizing over 30,000 image-caption pairs to enrich the descriptive quality of captions and generate corresponding Q&As in a cascading process. To maximize PathMMU’s authority, we invite seven pathologists to scrutinize each question under strict standards in PathMMU’s validation and test sets, while simultaneously setting an expert-level performance benchmark for PathMMU. We conduct extensive evaluations, including zero-shot assessments of 14 open-sourced and 4 closed-sourced LMMs and their robustness to image corruption. We also fine-tune representative LMMs to assess their adaptability to PathMMU. The empirical findings indicate that advanced LMMs struggle with the challenging PathMMU benchmark, with the top-performing LMM, GPT-4V, achieving only a 49.8% zero-shot performance, significantly lower than the 71.8% demonstrated by human pathologists. After fine-tuning, substantially smaller open-sourced LMMs can outperform GPT-4V but still fall short of the expertise shown by pathologists. We hope that the PathMMU will offer valuable insights and foster the development of more specialized, next-generation LMMs for pathology.

PDF ECCV Semantic Scholar

Cite

Text

Sun et al. "PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73033-7_4

Markdown

[Sun et al. "PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/sun2024eccv-pathmmu/) doi:10.1007/978-3-031-73033-7_4

BibTeX

@inproceedings{sun2024eccv-pathmmu,
  title     = {{PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology}},
  author    = {Sun, Yuxuan and Wu, Hao and Zhu, Chenglu and Zheng, Sunyi and Chen, Qizi and Zhang, Kai and Zhang, Yunlong and Wan, Dan and Lan, Xiaoxiao and Zheng, Mengyue and Li, Jingxiong and Lyu, Xinheng and Lin, Tao and Yang, Lin},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-73033-7_4},
  url       = {https://mlanthology.org/eccv/2024/sun2024eccv-pathmmu/}
}