Evaluation of Medical Large Language Models: Taxonomy, Review, and Directions
Abstract
The integration of Large Language Models (LLMs) into medicine presents both great opportunities and significant challenges, particularly in ensuring these models are accurate, reliable, and safe. While LLMs have shown impressive capabilities in understanding and generating human language, their application in the medical domain requires careful evaluation due to the critical nature of medical applications which are inherently linked to patient life and health. Current evaluations of LLMs in medicine are often fragmented and insufficient, with a lack of standardized performance metrics, limited use of real patient data, and insufficient attention to important applications, such as documentation, education, and research. Furthermore, traditional NLP-based evaluations are often inadequate for assessing the text generated by LLMs. Therefore, a robust evaluation is essential to ensure the responsible and effective use of LLMs in medical settings, and to address the inherent challenges associated with their implementation. This paper explores the various dimensions of LLM evaluation in the medical domain, proposes a new taxonomy for categorizing medical applications, and discusses directions for future research in this critical area.
Cite
Text
Lacerda et al. "Evaluation of Medical Large Language Models: Taxonomy, Review, and Directions." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/1169Markdown
[Lacerda et al. "Evaluation of Medical Large Language Models: Taxonomy, Review, and Directions." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/lacerda2025ijcai-evaluation/) doi:10.24963/IJCAI.2025/1169BibTeX
@inproceedings{lacerda2025ijcai-evaluation,
title = {{Evaluation of Medical Large Language Models: Taxonomy, Review, and Directions}},
author = {Lacerda, Anísio and Pappa, Gisele L. and Pereira, Adriano César Machado and Jr., Wagner Meira and de Almeida Barros, Alexandre Guimarães},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2025},
pages = {10528-10536},
doi = {10.24963/IJCAI.2025/1169},
url = {https://mlanthology.org/ijcai/2025/lacerda2025ijcai-evaluation/}
}