Adaptive and Background-Aware Vision Transformer for Real-Time UAV Tracking

Abstract

While discriminative correlation filters (DCF)-based trackers prevail in UAV tracking for their favorable efficiency, lightweight convolutional neural network (CNN)-based trackers using filter pruning have also demonstrated remarkable efficiency and precision. However, the use of pure vision transformer models (ViTs) for UAV tracking remains unexplored, which is a surprising finding given that ViTs have been shown to produce better performance and greater efficiency than CNNs in image classification. In this paper, we propose an efficient ViT-based tracking framework, Aba-ViTrack, for UAV tracking. In our framework, feature learning and template-search coupling are integrated into an efficient one-stream ViT to avoid an extra heavy relation modeling module. The proposed Aba-ViT exploits an adaptive and background-aware token computation method to reduce inference time. This approach adaptively discards tokens based on learned halting probabilities, which a priori are higher for background tokens than target ones. Extensive experiments on six UAV tracking benchmarks demonstrate that the proposed Aba-ViTrack achieves state-of-the-art performance in UAV tracking. Code is available at https://github.com/xyyang317/Aba-ViTrack.

Cite

Text

Li et al. "Adaptive and Background-Aware Vision Transformer for Real-Time UAV Tracking." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.01286

Markdown

[Li et al. "Adaptive and Background-Aware Vision Transformer for Real-Time UAV Tracking." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/li2023iccv-adaptive/) doi:10.1109/ICCV51070.2023.01286

BibTeX

@inproceedings{li2023iccv-adaptive,
  title     = {{Adaptive and Background-Aware Vision Transformer for Real-Time UAV Tracking}},
  author    = {Li, Shuiwang and Yang, Yangxiang and Zeng, Dan and Wang, Xucheng},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {13989-14000},
  doi       = {10.1109/ICCV51070.2023.01286},
  url       = {https://mlanthology.org/iccv/2023/li2023iccv-adaptive/}
}