MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training for X-Ray Diagnosis

Abstract

In this paper, we consider enhancing medical visual-language pre-training (VLP) with domain-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice. In particular, we make the following contributions: First, unlike existing works that directly process the raw reports, we adopt a novel triplet extraction module to extract the medical-related information, avoiding unnecessary complexity from language grammar and enhancing the supervision signals; Second, we propose a novel triplet encoding module with entity translation by querying a knowledge base, to exploit the rich domain knowledge in medical field, and implicitly build relationships between medical entities in the language embedding space; Third, we propose to use a Transformer-based fusion model for spatially aligning the entity description with visual signals at the image patch level, enabling the ability for medical diagnosis; Fourth, we conduct thorough experiments to validate the effectiveness of our architecture, and benchmark on numerous public benchmarks e.g., ChestX-ray14, RSNA Pneumonia, SIIM-ACR Pneumothorax, COVIDx CXR-2, COVID Rural, and EdemaSeverity. In both zero-shot and fine-tuning settings, our model has demonstrated strong performance compared with the former methods on disease classification and grounding.

Cite

Text

Wu et al. "MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training for X-Ray Diagnosis." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.01954

Markdown

[Wu et al. "MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training for X-Ray Diagnosis." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/wu2023iccv-medklip/) doi:10.1109/ICCV51070.2023.01954

BibTeX

@inproceedings{wu2023iccv-medklip,
  title     = {{MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training for X-Ray Diagnosis}},
  author    = {Wu, Chaoyi and Zhang, Xiaoman and Zhang, Ya and Wang, Yanfeng and Xie, Weidi},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {21372-21383},
  doi       = {10.1109/ICCV51070.2023.01954},
  url       = {https://mlanthology.org/iccv/2023/wu2023iccv-medklip/}
}