What's the Point: Semantic Segmentation with Point Supervision
Abstract
The semantic image segmentation task presents a trade-off between test time accuracy and training time annotation cost. Detailed per-pixel annotations enable training accurate models but are very time-consuming to obtain; image-level class labels are an order of magnitude cheaper but result in less accurate models. We take a natural step from image-level annotation towards stronger supervision: we ask annotators to point to an object if one exists. We incorporate this point supervision along with a novel objectness potential in the training loss function of a CNN model. Experimental results on the PASCAL VOC 2012 benchmark reveal that the combined effect of point-level supervision and objectness potential yields an improvement of \(12.9\,\%\) mIOU over image-level supervision. Further, we demonstrate that models trained with point-level supervision are more accurate than models trained with image-level, squiggle-level or full supervision given a fixed annotation budget.
Cite
Text
Bearman et al. "What's the Point: Semantic Segmentation with Point Supervision." European Conference on Computer Vision, 2016. doi:10.1007/978-3-319-46478-7_34Markdown
[Bearman et al. "What's the Point: Semantic Segmentation with Point Supervision." European Conference on Computer Vision, 2016.](https://mlanthology.org/eccv/2016/bearman2016eccv-point/) doi:10.1007/978-3-319-46478-7_34BibTeX
@inproceedings{bearman2016eccv-point,
title = {{What's the Point: Semantic Segmentation with Point Supervision}},
author = {Bearman, Amy L. and Russakovsky, Olga and Ferrari, Vittorio and Fei-Fei, Li},
booktitle = {European Conference on Computer Vision},
year = {2016},
pages = {549-565},
doi = {10.1007/978-3-319-46478-7_34},
url = {https://mlanthology.org/eccv/2016/bearman2016eccv-point/}
}