Relaxing Binary Constraints in Contrastive Vision-Language Medical Representation Learning
Abstract
By aligning paired image and caption embeddings as input contrastive vision-language representation learning has witnessed significant advances as illustrated by CLIP allowing visual encoders to learn from textual supervision and vice versa. Benefiting from millions of image-caption pairs collected from the Internet CLIP-like models show competitive performances against fully supervised baselines. However the learned visual representations are still undermined due to the binary constraint as most contrastive learning frameworks follow strict one-to-one correspondence for the input pairs of data and optimize the models using the InfoNCE loss function. The embeddings of the paired image-text are aligned while the unpaired image-text are pushed away from each other. In fact there are naturally many "false negatives" among these negative pairs since unpaired data can also have a high similarity. In this work we aim to overcome the impact of false negatives in vision-language representation learning by introducing soft targets for estimating the similarity between unpaired images and texts using external semantic knowledge structured in the form of graphs. The interest of such a method is demonstrated in the application context of medical imaging.
Cite
Text
Wei et al. "Relaxing Binary Constraints in Contrastive Vision-Language Medical Representation Learning." Winter Conference on Applications of Computer Vision, 2025.Markdown
[Wei et al. "Relaxing Binary Constraints in Contrastive Vision-Language Medical Representation Learning." Winter Conference on Applications of Computer Vision, 2025.](https://mlanthology.org/wacv/2025/wei2025wacv-relaxing/)BibTeX
@inproceedings{wei2025wacv-relaxing,
title = {{Relaxing Binary Constraints in Contrastive Vision-Language Medical Representation Learning}},
author = {Wei, Xiaoyang and Kurtz, Camille and Cloppet, Florence},
booktitle = {Winter Conference on Applications of Computer Vision},
year = {2025},
pages = {4462-4471},
url = {https://mlanthology.org/wacv/2025/wei2025wacv-relaxing/}
}