Combining Bottom-up, Top-Down, and Smoothness Cues for Weakly Supervised Image Segmentation

Abstract

This paper addresses the problem of weakly supervised semantic image segmentation. Our goal is to label every pixel in a new image, given only image-level object labels associated with training images. Our problem statement differs from common semantic segmentation, where pixel-wise annotations are typically assumed available in training. We specify a novel deep architecture which fuses three distinct computation processes toward semantic segmentation -- namely, (i) the bottom-up computation of neural activations in a CNN for the image-level prediction of object classes; (ii) the top-down estimation of conditional likelihoods of the CNN's activations given the predicted objects, resulting in probabilistic attention maps per object class; and (iii) the lateral attention-message passing from neighboring neurons at the same CNN layer. The fusion of (i)-(iii) is realized via a conditional random field as recurrent network aimed at generating a smooth and boundary-preserving segmentation. Unlike existing work, we formulate a unified end-to-end learning of all components of our deep architecture. Evaluation on the benchmark PASCAL VOC 2012 dataset demonstrates that we outperform reasonable weakly supervised baselines and state-of-the-art approaches.

Cite

Text

Roy and Todorovic. "Combining Bottom-up, Top-Down, and Smoothness Cues for Weakly Supervised Image Segmentation." Conference on Computer Vision and Pattern Recognition, 2017. doi:10.1109/CVPR.2017.770

Markdown

[Roy and Todorovic. "Combining Bottom-up, Top-Down, and Smoothness Cues for Weakly Supervised Image Segmentation." Conference on Computer Vision and Pattern Recognition, 2017.](https://mlanthology.org/cvpr/2017/roy2017cvpr-combining/) doi:10.1109/CVPR.2017.770

BibTeX

@inproceedings{roy2017cvpr-combining,
  title     = {{Combining Bottom-up, Top-Down, and Smoothness Cues for Weakly Supervised Image Segmentation}},
  author    = {Roy, Anirban and Todorovic, Sinisa},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2017},
  doi       = {10.1109/CVPR.2017.770},
  url       = {https://mlanthology.org/cvpr/2017/roy2017cvpr-combining/}
}