Leveraging Large-Scale Weakly Labeled Data for Semi-Supervised Mass Detection in Mammograms
Abstract
Mammographic mass detection is an integral part of a computer-aided diagnosis system. Annotating a large number of mammograms at pixel-level in order to train a mass detection model in a fully supervised fashion is costly and time-consuming. This paper presents a novel self-training framework for semi-supervised mass detection with soft image-level labels generated from diagnosis reports by Mammo-RoBERTa, a RoBERTa-based natural language processing model fine-tuned on the fully labeled data and associated mammography reports. Starting with a fully supervised model trained on the data with pixel-level masks, the proposed framework iteratively refines the model itself using the entire weakly labeled data (image-level soft label) in a self-training fashion. A novel sample selection strategy is proposed to identify those most informative samples for each iteration, based on the current model output and the soft labels of the weakly labeled data. A soft cross-entropy loss and a soft focal loss are also designed to serve as the image-level and pixel-level classification loss respectively. Our experiment results show that the proposed semi-supervised framework can improve the mass detection accuracy on top of the supervised baseline, and outperforms the previous state-of-the-art semi-supervised approaches with weakly labeled data, in some cases by a large margin.
Cite
Text
Tang et al. "Leveraging Large-Scale Weakly Labeled Data for Semi-Supervised Mass Detection in Mammograms." Conference on Computer Vision and Pattern Recognition, 2021. doi:10.1109/CVPR46437.2021.00385Markdown
[Tang et al. "Leveraging Large-Scale Weakly Labeled Data for Semi-Supervised Mass Detection in Mammograms." Conference on Computer Vision and Pattern Recognition, 2021.](https://mlanthology.org/cvpr/2021/tang2021cvpr-leveraging/) doi:10.1109/CVPR46437.2021.00385BibTeX
@inproceedings{tang2021cvpr-leveraging,
title = {{Leveraging Large-Scale Weakly Labeled Data for Semi-Supervised Mass Detection in Mammograms}},
author = {Tang, Yuxing and Cao, Zhenjie and Zhang, Yanbo and Yang, Zhicheng and Ji, Zongcheng and Wang, Yiwei and Han, Mei and Ma, Jie and Xiao, Jing and Chang, Peng},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2021},
pages = {3855-3864},
doi = {10.1109/CVPR46437.2021.00385},
url = {https://mlanthology.org/cvpr/2021/tang2021cvpr-leveraging/}
}