TDAM: Top-Down Attention Module for Contextually Guided Feature Selection in CNNs

Abstract

Attention modules for Convolutional Neural Networks (CNNs) are an effective method to enhance performance on multiple computer-vision tasks. While existing methods appropriately model channel-, spatial- and self-attention, they primarily operate in a feedforward bottom-up manner. Consequently, the attention mechanism strongly depends on the local information of a single input feature map and does not incorporate relatively semantically-richer contextual information available at higher layers that can specify “what and where to look” in lower-level feature maps through top-down information flow. Accordingly, in this work, we propose a lightweight top-down attention module (TDAM) that iteratively generates a “visual searchlight” to perform channel and spatial modulation of its inputs and outputs more contextually-relevant feature maps at each computation step. Our experiments indicate that TDAM enhances the performance of CNNs across multiple object-recognition benchmarks and outperforms prominent attention modules while being more parameter and memory efficient. Further, TDAM-based models learn to “shift attention” by localizing individual objects or features at each computation step without any explicit supervision resulting in a 5% improvement for ResNet50 on weakly-supervised object localization. Source code and models are publicly available at: https://github.com/shantanuj/TDAM_Top_down_attention_module

Cite

Text

Jaiswal et al. "TDAM: Top-Down Attention Module for Contextually Guided Feature Selection in CNNs." Proceedings of the European Conference on Computer Vision (ECCV), 2022. doi:10.1007/978-3-031-19806-9_15

Markdown

[Jaiswal et al. "TDAM: Top-Down Attention Module for Contextually Guided Feature Selection in CNNs." Proceedings of the European Conference on Computer Vision (ECCV), 2022.](https://mlanthology.org/eccv/2022/jaiswal2022eccv-tdam/) doi:10.1007/978-3-031-19806-9_15

BibTeX

@inproceedings{jaiswal2022eccv-tdam,
  title     = {{TDAM: Top-Down Attention Module for Contextually Guided Feature Selection in CNNs}},
  author    = {Jaiswal, Shantanu and Fernando, Basura and Tan, Cheston},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2022},
  doi       = {10.1007/978-3-031-19806-9_15},
  url       = {https://mlanthology.org/eccv/2022/jaiswal2022eccv-tdam/}
}