Unified Medical Image Pre-Training in Language-Guided Common Semantic Space

Abstract

Vision-Language Pre-training (VLP) has shown the merits of analysing medical images. It efficiently learns visual representations by leveraging supervisions in their corresponding reports, and in turn facilitates analysis and interpretation of intricate imaging data. However, such observation is predominantly justified on single-modality data (mostly 2D images like X-rays), adapting VLP to learning unified representations for medical images in real scenario remains an open challenge. This arises from medical images often encompass a variety of modalities, especially modalities with different dimensions (e.g., 3D images like Computed Tomography), and there are almost no paired multi-dimension data here. To overcome the aforementioned challenges, we propose an Unified Medical Image Pre-training framework, namely , which utilizes diagnostic reports as common semantic space to create unified representations for diverse modalities of medical images (especially for 2D and 3D images). Under the text’s guidance, effectively select text-related 2D slices from sophisticated 3D volume, which acts as pseudo-pairs to bridge 2D and 3D data, ultimately enhancing the consistency across various medical imaging modalities. To demonstrate the effectiveness and versatility of , we evaluate its performance on both 2D and 3D images across several different datasets, covering a wide range of medical image tasks such as classification, segmentation, and retrieval. has demonstrated superior performance in downstream tasks, showcasing its effectiveness in establishing a universal medical visual representation.

Cite

Text

He et al. "Unified Medical Image Pre-Training in Language-Guided Common Semantic Space." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73004-7_8

Markdown

[He et al. "Unified Medical Image Pre-Training in Language-Guided Common Semantic Space." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/he2024eccv-unified/) doi:10.1007/978-3-031-73004-7_8

BibTeX

@inproceedings{he2024eccv-unified,
  title     = {{Unified Medical Image Pre-Training in Language-Guided Common Semantic Space}},
  author    = {He, Xiaoxuan and Yang, Yifan and Jiang, Xinyang and Luo, Xufang and Hu, Haoji and Zhao, Siyun and Li, Dongsheng and Yang, Yuqing and Qiu, Lili},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-73004-7_8},
  url       = {https://mlanthology.org/eccv/2024/he2024eccv-unified/}
}