PSANet: Point-Wise Spatial Attention Network for Scene Parsing

Abstract

We notice information flow in convolutional neural networks is restricted inside local neighborhood regions due to the physical design of convolutional filters, which limits the overall understanding of complex scenes. In this paper, we propose the point-wise spatial attention network (PSANet) to relax the local neighborhood constraint. Each position on the feature map is connected to all the other ones through a self-adaptively learned attention mask. Moreover, information propagation in bi-direction for scene parsing is enabled. Information at other positions can be collected to help the prediction of the current position and vice versa, information at the current position can be distributed to assist the prediction of other ones. Our proposed approach achieves top performance on various competitive scene parsing datasets, including ADE20K, PASCAL VOC 2012 and Cityscapes, demonstrating its effectiveness and generality.

Cite

Text

Zhao et al. "PSANet: Point-Wise Spatial Attention Network for Scene Parsing." Proceedings of the European Conference on Computer Vision (ECCV), 2018. doi:10.1007/978-3-030-01240-3_17

Markdown

[Zhao et al. "PSANet: Point-Wise Spatial Attention Network for Scene Parsing." Proceedings of the European Conference on Computer Vision (ECCV), 2018.](https://mlanthology.org/eccv/2018/zhao2018eccv-psanet/) doi:10.1007/978-3-030-01240-3_17

BibTeX

@inproceedings{zhao2018eccv-psanet,
  title     = {{PSANet: Point-Wise Spatial Attention Network for Scene Parsing}},
  author    = {Zhao, Hengshuang and Zhang, Yi and Liu, Shu and Shi, Jianping and Change Loy, Chen and Lin, Dahua and Jia, Jiaya},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2018},
  doi       = {10.1007/978-3-030-01240-3_17},
  url       = {https://mlanthology.org/eccv/2018/zhao2018eccv-psanet/}
}