Less Is More: Pay Less Attention in Vision Transformers

Abstract

Transformers have become one of the dominant architectures in deep learning, particularly as a powerful alternative to convolutional neural networks (CNNs) in computer vision. However, Transformer training and inference in previous works can be prohibitively expensive due to the quadratic complexity of self-attention over a long sequence of representations, especially for high-resolution dense prediction tasks. To this end, we present a novel Less attention vIsion Transformer (LIT), building upon the fact that the early self-attention layers in Transformers still focus on local patterns and bring minor benefits in recent hierarchical vision Transformers. Specifically, we propose a hierarchical Transformer where we use pure multi-layer perceptrons (MLPs) to encode rich local patterns in the early stages while applying self-attention modules to capture longer dependencies in deeper layers. Moreover, we further propose a learned deformable token merging module to adaptively fuse informative patches in a non-uniform manner. The proposed LIT achieves promising performance on image recognition tasks, including image classification, object detection and instance segmentation, serving as a strong backbone for many vision tasks. Code is available at https://github.com/zip-group/LIT.

Cite

Text

Pan et al. "Less Is More: Pay Less Attention in Vision Transformers." AAAI Conference on Artificial Intelligence, 2022. doi:10.1609/AAAI.V36I2.20099

Markdown

[Pan et al. "Less Is More: Pay Less Attention in Vision Transformers." AAAI Conference on Artificial Intelligence, 2022.](https://mlanthology.org/aaai/2022/pan2022aaai-less/) doi:10.1609/AAAI.V36I2.20099

BibTeX

@inproceedings{pan2022aaai-less,
  title     = {{Less Is More: Pay Less Attention in Vision Transformers}},
  author    = {Pan, Zizheng and Zhuang, Bohan and He, Haoyu and Liu, Jing and Cai, Jianfei},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2022},
  pages     = {2035-2043},
  doi       = {10.1609/AAAI.V36I2.20099},
  url       = {https://mlanthology.org/aaai/2022/pan2022aaai-less/}
}