Explicit Visual Prompts for Visual Object Tracking

Shi, Liangtao; Zhong, Bineng; Liang, Qihua; Li, Ning; Zhang, Shengping; Li, Xianxian

doi:10.1609/AAAI.V38I5.28286

Explicit Visual Prompts for Visual Object Tracking

Liangtao Shi, Bineng Zhong, Qihua Liang, Ning Li, Shengping Zhang, Xianxian Li

AAAI 2024 pp. 4838-4846

doi:10.1609/AAAI.V38I5.28286 /aaai/2024/shi2024aaai-explicit/

Abstract

How to effectively exploit spatio-temporal information is crucial to capture target appearance changes in visual tracking. However, most deep learning-based trackers mainly focus on designing a complicated appearance model or template updating strategy, while lacking the exploitation of context between consecutive frames and thus entailing the when-and-how-to-update dilemma. To address these issues, we propose a novel explicit visual prompts framework for visual tracking, dubbed EVPTrack. Specifically, we utilize spatio-temporal tokens to propagate information between consecutive frames without focusing on updating templates. As a result, we cannot only alleviate the challenge of when-to-update, but also avoid the hyper-parameters associated with updating strategies. Then, we utilize the spatio-temporal tokens to generate explicit visual prompts that facilitate inference in the current frame. The prompts are fed into a transformer encoder together with the image tokens without additional processing. Consequently, the efficiency of our model is improved by avoiding how-to-update. In addition, we consider multi-scale information as explicit visual prompts, providing multiscale template features to enhance the EVPTrack's ability to handle target scale changes. Extensive experimental results on six benchmarks (i.e., LaSOT, LaSOText, GOT-10k, UAV123, TrackingNet, and TNL2K.) validate that our EVPTrack can achieve competitive performance at a real-time speed by effectively exploiting both spatio-temporal and multi-scale information. Code and models are available at https://github.com/GXNU-ZhongLab/EVPTrack.

PDF AAAI Semantic Scholar

Cite

Text

Shi et al. "Explicit Visual Prompts for Visual Object Tracking." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I5.28286

Markdown

[Shi et al. "Explicit Visual Prompts for Visual Object Tracking." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/shi2024aaai-explicit/) doi:10.1609/AAAI.V38I5.28286

BibTeX

@inproceedings{shi2024aaai-explicit,
  title     = {{Explicit Visual Prompts for Visual Object Tracking}},
  author    = {Shi, Liangtao and Zhong, Bineng and Liang, Qihua and Li, Ning and Zhang, Shengping and Li, Xianxian},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {4838-4846},
  doi       = {10.1609/AAAI.V38I5.28286},
  url       = {https://mlanthology.org/aaai/2024/shi2024aaai-explicit/}
}