Semantic Segmentation by Early Region Proxy

CVPR 2022 pp. 1258-1268

doi:10.1109/CVPR52688.2022.00132 /cvpr/2022/zhang2022cvpr-semantic/

Abstract

Typical vision backbones manipulate structured features. As a compromise, semantic segmentation has long been modeled as per-point prediction on dense regular grids. In this work, we present a novel and efficient modeling that starts from interpreting the image as a tessellation of learnable regions, each of which has flexible geometrics and carries homogeneous semantics. To model region-wise context, we exploit Transformer to encode regions in a sequence-to-sequence manner by applying multi-layer self-attention on the region embeddings, which serve as proxies of specific regions. Semantic segmentation is now carried out as per-region prediction on top of the encoded region embeddings using a single linear classifier, where a decoder is no longer needed. The proposed RegProxy model discards the common Cartesian feature layout and operates purely at region level. Hence, it exhibits the most competitive performance-efficiency trade-off compared with the conventional dense prediction methods. For example, on ADE20K, the small-sized RegProxy-S/16 outperforms the best CNN model using 25% parameters and 4% computation, while the largest RegProxy-L/16 achieves 52.9mIoU which outperforms the state-of-the-art by 2.1% with fewer resources. Codes and models are available at https://github.com/YiF-Zhang/RegionProxy.

PDF CVPR Semantic Scholar

Cite

Text

Zhang et al. "Semantic Segmentation by Early Region Proxy." Conference on Computer Vision and Pattern Recognition, 2022. doi:10.1109/CVPR52688.2022.00132

Markdown

[Zhang et al. "Semantic Segmentation by Early Region Proxy." Conference on Computer Vision and Pattern Recognition, 2022.](https://mlanthology.org/cvpr/2022/zhang2022cvpr-semantic/) doi:10.1109/CVPR52688.2022.00132

BibTeX

@inproceedings{zhang2022cvpr-semantic,
  title     = {{Semantic Segmentation by Early Region Proxy}},
  author    = {Zhang, Yifan and Pang, Bo and Lu, Cewu},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2022},
  pages     = {1258-1268},
  doi       = {10.1109/CVPR52688.2022.00132},
  url       = {https://mlanthology.org/cvpr/2022/zhang2022cvpr-semantic/}
}