Scaling Supervision for Free: Leveraging Universal Segmentation Models for Enhanced Medical Image Diagnosis

Abstract

Deep learning-based medical image analysis has been constrained by the limited availability of large-scale annotated data. While recent advances in large language models have enabled scaling automatic extraction of diagnostic labels from reports, we propose that scaling other form of supervision could be an equally important yet unexplored direction. Inspired by the success of foundation models, we leverage modern universal segmentation model to scale anatomical segmentation as an additional supervision signal during training. Through extensive experiments on three large-scale CT datasets totaling 58K+ volumes, we demonstrate that incorporating this ``free" anatomical supervision consistently improves the performance of various mainstream architectures (ResNet, ViT, and Swin Transformer) by up to 12.74\%, with particularly significant gains for Transformer-based models and anatomically-localized abnormalities, while maintaining inference efficiency as the segmentation branch is only used during training. This work opens up new direction for scaling in medical imaging and demonstrates how existing universal segmentation models can be repurposed to enhance diagnostic models at virtually no additional cost.

Cite

Text

Li et al. "Scaling Supervision for Free: Leveraging Universal Segmentation Models for Enhanced Medical Image Diagnosis." Medical Imaging with Deep Learning, 2025.

Markdown

[Li et al. "Scaling Supervision for Free: Leveraging Universal Segmentation Models for Enhanced Medical Image Diagnosis." Medical Imaging with Deep Learning, 2025.](https://mlanthology.org/midl/2025/li2025midl-scaling/)

BibTeX

@inproceedings{li2025midl-scaling,
  title     = {{Scaling Supervision for Free: Leveraging Universal Segmentation Models for Enhanced Medical Image Diagnosis}},
  author    = {Li, Yingtai and Ming, Shuai and Lai, Haoran and Tang, Fenghe and Wei, Wei and Zhou, S Kevin},
  booktitle = {Medical Imaging with Deep Learning},
  year      = {2025},
  url       = {https://mlanthology.org/midl/2025/li2025midl-scaling/}
}