Toward Joint Thing-and-Stuff Mining for Weakly Supervised Panoptic Segmentation

Abstract

Panoptic segmentation aims to partition an image to object instances and semantic content for thing and stuff categories, respectively. To date, learning weakly supervised panoptic segmentation (WSPS) with only image-level labels remains unexplored. In this paper, we propose an efficient jointly thing-and-stuff mining (JTSM) framework for WSPS. To this end, we design a novel mask of interest pooling (MoIPool) to extract fixed-size pixel-accurate feature maps of arbitrary-shape segmentations. MoIPool enables a panoptic mining branch to leverage multiple instance learning (MIL) to recognize things and stuff segmentation in a unified manner. We further refine segmentation masks with parallel instance and semantic segmentation branches via self-training, which collaborates the mined masks from panoptic mining with bottom-up object evidence as pseudo-ground-truth labels to improve spatial coherence and contour localization. Experimental results demonstrate the effectiveness of JTSM on PASCAL VOC and MS COCO. As a by-product, we achieve competitive results for weakly supervised object detection and instance segmentation. This work is a first step towards tackling challenge panoptic segmentation task with only image-level labels.

Cite

Text

Shen et al. "Toward Joint Thing-and-Stuff Mining for Weakly Supervised Panoptic Segmentation." Conference on Computer Vision and Pattern Recognition, 2021. doi:10.1109/CVPR46437.2021.01642

Markdown

[Shen et al. "Toward Joint Thing-and-Stuff Mining for Weakly Supervised Panoptic Segmentation." Conference on Computer Vision and Pattern Recognition, 2021.](https://mlanthology.org/cvpr/2021/shen2021cvpr-joint/) doi:10.1109/CVPR46437.2021.01642

BibTeX

@inproceedings{shen2021cvpr-joint,
  title     = {{Toward Joint Thing-and-Stuff Mining for Weakly Supervised Panoptic Segmentation}},
  author    = {Shen, Yunhang and Cao, Liujuan and Chen, Zhiwei and Lian, Feihong and Zhang, Baochang and Su, Chi and Wu, Yongjian and Huang, Feiyue and Ji, Rongrong},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2021},
  pages     = {16694-16705},
  doi       = {10.1109/CVPR46437.2021.01642},
  url       = {https://mlanthology.org/cvpr/2021/shen2021cvpr-joint/}
}