Learning Adaptive and View-Invariant Vision Transformer for Real-Time UAV Tracking

Abstract

Harnessing transformer-based models, visual tracking has made substantial strides. However, the sluggish performance of current trackers limits their practicality on devices with constrained computational capabilities, especially for real-time unmanned aerial vehicle (UAV) tracking. Addressing this challenge, we introduce AVTrack, an adaptive computation framework tailored to selectively activate transformer blocks for real-time UAV tracking in this work. Our novel Activation Module (AM) dynamically optimizes ViT architecture, selectively engaging relevant components and enhancing inference efficiency without compromising much tracking performance. Moreover, we bolster the effectiveness of ViTs, particularly in addressing challenges arising from extreme changes in viewing angles commonly encountered in UAV tracking, by learning view-invariant representations through mutual information maximization. Extensive experiments on five tracking benchmarks affirm the effectiveness and versatility of our approach, positioning it as a state-of-the-art solution in visual tracking. Code is released at: https://github.com/wuyou3474/AVTrack.

Cite

Text

Li et al. "Learning Adaptive and View-Invariant Vision Transformer for Real-Time UAV Tracking." International Conference on Machine Learning, 2024.

Markdown

[Li et al. "Learning Adaptive and View-Invariant Vision Transformer for Real-Time UAV Tracking." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/li2024icml-learning-a/)

BibTeX

@inproceedings{li2024icml-learning-a,
  title     = {{Learning Adaptive and View-Invariant Vision Transformer for Real-Time UAV Tracking}},
  author    = {Li, Yongxin and Liu, Mengyuan and Wu, You and Wang, Xucheng and Yang, Xiangyang and Li, Shuiwang},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {28403-28420},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/li2024icml-learning-a/}
}