AiATrack: Attention in Attention for Transformer Visual Tracking

Abstract

Transformer trackers have achieved impressive advancements recently, where the attention mechanism plays an important role. However, the independent correlation computation in the attention mechanism could result in noisy and ambiguous attention weights, which inhibits further performance improvement. To address this issue, we propose an attention in attention (AiA) module, which enhances appropriate correlations and suppresses erroneous ones by seeking consensus among all correlation vectors. Our AiA module can be readily applied to both self-attention blocks and cross-attention blocks to facilitate feature aggregation and information propagation for visual tracking. Moreover, we propose a streamlined Transformer tracking framework, dubbed AiATrack, by introducing efficient feature reuse and target-background embeddings to make full use of temporal references. Experiments show that our tracker achieves state-of-the-art performance on six tracking benchmarks while running at a real-time speed.

Cite

Text

Gao et al. "AiATrack: Attention in Attention for Transformer Visual Tracking." Proceedings of the European Conference on Computer Vision (ECCV), 2022. doi:10.1007/978-3-031-20047-2_9

Markdown

[Gao et al. "AiATrack: Attention in Attention for Transformer Visual Tracking." Proceedings of the European Conference on Computer Vision (ECCV), 2022.](https://mlanthology.org/eccv/2022/gao2022eccv-aiatrack/) doi:10.1007/978-3-031-20047-2_9

BibTeX

@inproceedings{gao2022eccv-aiatrack,
  title     = {{AiATrack: Attention in Attention for Transformer Visual Tracking}},
  author    = {Gao, Shenyuan and Zhou, Chunluan and Ma, Chao and Wang, Xinggang and Yuan, Junsong},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2022},
  doi       = {10.1007/978-3-031-20047-2_9},
  url       = {https://mlanthology.org/eccv/2022/gao2022eccv-aiatrack/}
}