Single-Stage Semantic Segmentation from Image Labels

Abstract

Recent years have seen a rapid growth in new approaches improving the accuracy of semantic segmentation in a weakly supervised setting, i.e. with only image-level labels available for training. However, this has come at the cost of increased model complexity and sophisticated multi-stage training procedures. This is in contrast to earlier work that used only a single stage -- training one segmentation network on image labels -- which was abandoned due to inferior segmentation accuracy. In this work, we first define three desirable properties of a weakly supervised method: local consistency, semantic fidelity, and completeness. Using these properties as guidelines, we then develop a segmentation-based network model and a self-supervised training scheme to train for semantic masks from image-level annotations in a single stage. We show that despite its simplicity, our method achieves results that are competitive with significantly more complex pipelines, substantially outperforming earlier single-stage methods.

Cite

Text

Araslanov and Roth. "Single-Stage Semantic Segmentation from Image Labels." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. doi:10.1109/CVPR42600.2020.00431

Markdown

[Araslanov and Roth. "Single-Stage Semantic Segmentation from Image Labels." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.](https://mlanthology.org/cvpr/2020/araslanov2020cvpr-singlestage/) doi:10.1109/CVPR42600.2020.00431

BibTeX

@inproceedings{araslanov2020cvpr-singlestage,
  title     = {{Single-Stage Semantic Segmentation from Image Labels}},
  author    = {Araslanov, Nikita and Roth, Stefan},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2020},
  doi       = {10.1109/CVPR42600.2020.00431},
  url       = {https://mlanthology.org/cvpr/2020/araslanov2020cvpr-singlestage/}
}