AttentionShift: Iteratively Estimated Part-Based Attention mAP for Pointly Supervised Instance Segmentation

Abstract

Pointly supervised instance segmentation (PSIS) learns to segment objects using a single point within the object extent as supervision. Challenged by the non-negligible semantic variance between object parts, however, the single supervision point causes semantic bias and false segmentation. In this study, we propose an AttentionShift method, to solve the semantic bias issue by iteratively decomposing the instance attention map to parts and estimating fine-grained semantics of each part. AttentionShift consists of two modules plugged on the vision transformer backbone: (i) token querying for pointly supervised attention map generation, and (ii) key-point shift, which re-estimates part-based attention maps by key-point filtering in the feature space. These two steps are iteratively performed so that the part-based attention maps are optimized spatially as well as in the feature space to cover full object extent. Experiments on PASCAL VOC and MS COCO 2017 datasets show that AttentionShift respectively improves the state-of-the-art of by 7.7% and 4.8% under [email protected], setting a solid PSIS baseline using vision transformer. Code is enclosed in the supplementary material.

Cite

Text

Liao et al. "AttentionShift: Iteratively Estimated Part-Based Attention mAP for Pointly Supervised Instance Segmentation." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.01870

Markdown

[Liao et al. "AttentionShift: Iteratively Estimated Part-Based Attention mAP for Pointly Supervised Instance Segmentation." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/liao2023cvpr-attentionshift/) doi:10.1109/CVPR52729.2023.01870

BibTeX

@inproceedings{liao2023cvpr-attentionshift,
  title     = {{AttentionShift: Iteratively Estimated Part-Based Attention mAP for Pointly Supervised Instance Segmentation}},
  author    = {Liao, Mingxiang and Guo, Zonghao and Wang, Yuze and Yuan, Peng and Feng, Bailan and Wan, Fang},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {19519-19528},
  doi       = {10.1109/CVPR52729.2023.01870},
  url       = {https://mlanthology.org/cvpr/2023/liao2023cvpr-attentionshift/}
}