SCARF: A Semantic Constrained Attention Refinement Network for Semantic Segmentation

Abstract

Semantic segmentation has achieved great progress by exploiting the contextual dependencies. In this paper, we propose an end-to-end Semantic Constrained Attention ReFinement (SCARF) network, based on semantic constrained contextual dependencies, to fully utilize the semantic information across different layers. Our novelties lie in the following aspects: Firstly, we present a general framework for capturing the non-local contextual dependencies. Secondly, within the framework, we introduce an efficient Category Attention (CA) block to capture semantic-related context by using the category constraint from coarse segmentation, which reduces the computational complexity from O(n2) to O(n) for image with n pixels. Thirdly, we overcome the contextual information confusion problem by balancing the non-local contextual dependencies and the local consistency adaptively using a category-wise learning weight. Finally, we fully utilize the multi-scale semantic-related con-textual information by refining the segmentation iteratively across layers with semantic constraint. Extensive evaluations demonstrate that our SCARF network significantly improves the segmentation results and achieves superior performance 85.0% mIoU on PASCAL VOC 2012, 55.0% mIoU on PASCAL Context, and 82.1% mIoU on Cityscapes.

Cite

Text

Ding et al. "SCARF: A Semantic Constrained Attention Refinement Network for Semantic Segmentation." IEEE/CVF International Conference on Computer Vision Workshops, 2021. doi:10.1109/ICCVW54120.2021.00335

Markdown

[Ding et al. "SCARF: A Semantic Constrained Attention Refinement Network for Semantic Segmentation." IEEE/CVF International Conference on Computer Vision Workshops, 2021.](https://mlanthology.org/iccvw/2021/ding2021iccvw-scarf/) doi:10.1109/ICCVW54120.2021.00335

BibTeX

@inproceedings{ding2021iccvw-scarf,
  title     = {{SCARF: A Semantic Constrained Attention Refinement Network for Semantic Segmentation}},
  author    = {Ding, Xiaofeng and Shen, Chaomin and Che, Zhengping and Zeng, Tieyong and Peng, Yaxin},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2021},
  pages     = {3002-3011},
  doi       = {10.1109/ICCVW54120.2021.00335},
  url       = {https://mlanthology.org/iccvw/2021/ding2021iccvw-scarf/}
}