Disentangled Non-Local Neural Networks
Abstract
The non-local block is a popular module for strengthening the context modeling ability of a regular convolutional neural network. This paper first studies the non-local block in depth, where we find that its attention computation can be split into two terms, a whitened pairwise term accounting for the relationship between two pixels and a unary term representing the saliency of every pixel. We also observe that the two terms trained alone tend to model different visual clues, e.g. the whitened pairwise term learns within-region relationships while the unary term learns salient boundaries. However, the two terms are tightly coupled in the non-local block, which hinders the learning of each. Based on these findings, we present the disentangled non-local block, where the two terms are decoupled to facilitate learning for both terms. We demonstrate the effectiveness of the decoupled design on various tasks, including semantic segmentation, object detection and action recognition. The code will be made publicly available.
Cite
Text
Yin et al. "Disentangled Non-Local Neural Networks." Proceedings of the European Conference on Computer Vision (ECCV), 2020. doi:10.1007/978-3-030-58555-6_12Markdown
[Yin et al. "Disentangled Non-Local Neural Networks." Proceedings of the European Conference on Computer Vision (ECCV), 2020.](https://mlanthology.org/eccv/2020/yin2020eccv-disentangled/) doi:10.1007/978-3-030-58555-6_12BibTeX
@inproceedings{yin2020eccv-disentangled,
title = {{Disentangled Non-Local Neural Networks}},
author = {Yin, Minghao and Yao, Zhuliang and Cao, Yue and Li, Xiu and Zhang, Zheng and Lin, Stephen and Hu, Han},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2020},
doi = {10.1007/978-3-030-58555-6_12},
url = {https://mlanthology.org/eccv/2020/yin2020eccv-disentangled/}
}