Continual Self-Supervised Learning: Towards Universal Multi-Modal Medical Data Representation Learning

Abstract

Self-supervised learning (SSL) is an efficient pre-training method for medical image analysis. However current research is mostly confined to certain modalities consuming considerable time and resources without achieving universality across different modalities. A straightforward solution is combining all modality data for joint SSL which poses practical challenges. Firstly our experiments reveal conflicts in representation learning as the number of modalities increases. Secondly multi-modal data collected in advance cannot cover all real-world scenarios. In this paper we reconsider versatile SSL from the perspective of continual learning and propose MedCoSS a continuous SSL approach for multi-modal medical data. Different from joint representation learning MedCoSS assigns varying data modalities to separate training stages creating a multi-stage pre-training process. We propose a rehearsal-based continual learning approach to manage modal conflicts and prevent catastrophic forgetting. Specifically we use the k-means sampling to retain and rehearse previous modality data during new modality learning. Moreover we apply feature distillation and intra-modal mixup on buffer data for knowledge retention bypassing pretext tasks. We conduct experiments on a large-scale multi-modal unlabeled dataset including clinical reports X-rays CT MRI and pathological images. Experimental results demonstrate MedCoSS's exceptional generalization ability across 9 downstream datasets and its significant scalability in integrating new modality data. The code and pre-trained model are available at https://github.com/yeerwen/MedCoSS.

Cite

Text

Ye et al. "Continual Self-Supervised Learning: Towards Universal Multi-Modal Medical Data Representation Learning." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.01057

Markdown

[Ye et al. "Continual Self-Supervised Learning: Towards Universal Multi-Modal Medical Data Representation Learning." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/ye2024cvpr-continual/) doi:10.1109/CVPR52733.2024.01057

BibTeX

@inproceedings{ye2024cvpr-continual,
  title     = {{Continual Self-Supervised Learning: Towards Universal Multi-Modal Medical Data Representation Learning}},
  author    = {Ye, Yiwen and Xie, Yutong and Zhang, Jianpeng and Chen, Ziyang and Wu, Qi and Xia, Yong},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {11114-11124},
  doi       = {10.1109/CVPR52733.2024.01057},
  url       = {https://mlanthology.org/cvpr/2024/ye2024cvpr-continual/}
}