Non-Local Neural Networks with Grouped Bilinear Attentional Transforms
Abstract
Modeling spatial or temporal long-range dependency plays a key role in deep neural networks. Conventional dominant solutions include recurrent operations on sequential data or deeply stacking convolutional layers with small kernel size. Recently, a number of non-local operators (such as self-attention based) have been devised. They are typically generic and can be plugged into many existing network pipelines for globally computing among any two neurons in a feature map. This work proposes a novel non-local operator. It is inspired by the attention mechanism of human visual system, which can quickly attend to important local parts in sight and suppress other less-relevant information. The core of our method is learnable and data-adaptive bilinear attentional transform (BA-Transform), whose merits are three-folds: first, BA-Transform is versatile to model a wide spectrum of local or global attentional operations, such as emphasizing specific local regions. Each BA-Transform is learned in a data-adaptive way; Secondly, to address the discrepancy among features, we further design grouped BA-Transforms, which essentially apply different attentional operations to different groups of feature channels; Thirdly, many existing non-local operators are computation-intensive. The proposed BA-Transform is implemented by simple matrix multiplication and admits better efficacy. For empirical evaluation, we perform comprehensive experiments on two large-scale benchmarks, ImageNet and Kinetics, for image / video classification respectively. The achieved accuracies and various ablation experiments consistently demonstrate significant improvement by large margins.
Cite
Text
Chi et al. "Non-Local Neural Networks with Grouped Bilinear Attentional Transforms." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. doi:10.1109/CVPR42600.2020.01182Markdown
[Chi et al. "Non-Local Neural Networks with Grouped Bilinear Attentional Transforms." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.](https://mlanthology.org/cvpr/2020/chi2020cvpr-nonlocal/) doi:10.1109/CVPR42600.2020.01182BibTeX
@inproceedings{chi2020cvpr-nonlocal,
title = {{Non-Local Neural Networks with Grouped Bilinear Attentional Transforms}},
author = {Chi, Lu and Yuan, Zehuan and Mu, Yadong and Wang, Changhu},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year = {2020},
doi = {10.1109/CVPR42600.2020.01182},
url = {https://mlanthology.org/cvpr/2020/chi2020cvpr-nonlocal/}
}