U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding

Le, Anjie; Liu, Henan; Wangyue,; Liu, Zhenyu; Zhu, Rongkun; Weng, Taohan; Yu, Jinze; Wang, Boyang; Wu, Yalun; Yan, Kaiwen; Sun, Quanlin; Jiang, Meirui; Pei, Jialun; Liu, Siya; Zheng, Haoyun; Li, Zhoujun; Noble, Alison; Souquet, Jacques; Guo, Xiaoqing; Lin, Manxi; Guo, Hongcheng

U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding

ICLR 2026

/iclr/2026/le2026iclr-u2bench/

Abstract

Ultrasound is a widely-used imaging modality critical to global healthcare, yet its interpretation remains challenging due to its varying image quality on operators, noises, and anatomical structures. Although large vision-language models (LVLMs) have demonstrated impressive multimodal capabilities across natural and medical domains, their performance on ultrasound remains largely unexplored. We introduce U2-BENCH, the first comprehensive benchmark to evaluate LVLMs on ultrasound understanding across classification, detection, regression, and text generation tasks. U2-BENCH aggregates 7,241 cases spanning 15 anatomical regions and defines 8 clinically inspired tasks, such as diagnosis, view recognition, lesion localization, clinical value estimation, and report generation, across 50 ultrasound application scenarios. We evaluate 23 state-of-the-art LVLMs, both open- and closed-source, general-purpose and medical-specific. Our results reveal strong performance on image-level classification, but persistent challenges in spatial reasoning and clinical language generation. U2-BENCH establishes a rigorous and unified testbed to assess and accelerate LVLM research in the uniquely multimodal domain of medical ultrasound imaging.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Le et al. "U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding." International Conference on Learning Representations, 2026.

Markdown

[Le et al. "U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/le2026iclr-u2bench/)

BibTeX

@inproceedings{le2026iclr-u2bench,
  title     = {{U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding}},
  author    = {Le, Anjie and Liu, Henan and Wangyue,  and Liu, Zhenyu and Zhu, Rongkun and Weng, Taohan and Yu, Jinze and Wang, Boyang and Wu, Yalun and Yan, Kaiwen and Sun, Quanlin and Jiang, Meirui and Pei, Jialun and Liu, Siya and Zheng, Haoyun and Li, Zhoujun and Noble, Alison and Souquet, Jacques and Guo, Xiaoqing and Lin, Manxi and Guo, Hongcheng},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/le2026iclr-u2bench/}
}