Quantitative Evaluation of Multimodal LLMs in Pediatric Radiology Report Generation

Abstract

Pediatric radiology presents unique challenges due to the distinct physiological and anatomical characteristics of children, setting it apart from general adult-focused radiology. While recent applications of Multimodal Large Language Models (MLLMs)—such as GPT-4o and LLaVA-Med—have shown promise in radiology report generation, they predominantly rely on adult datasets in model training with limited pediatric knowledge coverage. In this study, we investigate this gap by evaluating and benchmarking MLLMs on pediatric chest X-ray pneumonia cases, demonstrating the critical need for dedicated pediatric training data to ensure robust, age-specific performance in MLLM-driven radiology applications.

Cite

Text

Ding and Cao. "Quantitative Evaluation of Multimodal LLMs in Pediatric Radiology Report Generation." ICLR 2025 Workshops: AI4CHL, 2025.

Markdown

[Ding and Cao. "Quantitative Evaluation of Multimodal LLMs in Pediatric Radiology Report Generation." ICLR 2025 Workshops: AI4CHL, 2025.](https://mlanthology.org/iclrw/2025/ding2025iclrw-quantitative/)

BibTeX

@inproceedings{ding2025iclrw-quantitative,
  title     = {{Quantitative Evaluation of Multimodal LLMs in Pediatric Radiology Report Generation}},
  author    = {Ding, Zhiguang and Cao, Rodrigo},
  booktitle = {ICLR 2025 Workshops: AI4CHL},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/ding2025iclrw-quantitative/}
}