Multi-Modal Foundation Models for Computational Pathology: A Survey
Abstract
Foundation models have emerged as a powerful paradigm in computational pathology (CPath), enabling scalable and generalizable analysis of histopathological images. While early developments centered on uni-modal models trained solely on visual data, recent advances have highlighted the promise of multi-modal foundation models that integrate heterogeneous data sources such as textual reports, structured domain knowledge, and molecular profiles. In this survey, we provide a comprehensive and up-to-date review of multi-modal foundation models in CPath, with a particular focus on models built upon hematoxylin and eosin (H&E) stained whole slide images (WSIs) and tile-level representations. We categorize 34 state-of-the-art multi-modal foundation models into three major paradigms: vision-language, vision-knowledge graph, and vision-gene expression. We further divide vision-language models into non-LLM-based and LLM-based approaches. Additionally, we analyze 30 available multi-modal datasets tailored for pathology, grouped into image-text pairs, instruction datasets, and image-other modality pairs. Our survey also presents a taxonomy of downstream tasks, highlights training and evaluation strategies, and identifies key challenges and future directions. We aim for this survey to serve as a valuable resource for researchers and practitioners working at the intersection of pathology and AI.
Cite
Text
Li et al. "Multi-Modal Foundation Models for Computational Pathology: A Survey." Transactions on Machine Learning Research, 2025.Markdown
[Li et al. "Multi-Modal Foundation Models for Computational Pathology: A Survey." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/li2025tmlr-multimodal/)BibTeX
@article{li2025tmlr-multimodal,
title = {{Multi-Modal Foundation Models for Computational Pathology: A Survey}},
author = {Li, Dong and Wan, Guihong and Wu, Xintao and Wu, Xinyu and Chen, Xiaohui and He, Yi and Chen, Zhong and Sorger, Peter K and Zhao, Chen},
journal = {Transactions on Machine Learning Research},
year = {2025},
url = {https://mlanthology.org/tmlr/2025/li2025tmlr-multimodal/}
}