ARTrackV2: Prompting Autoregressive Tracker Where to Look and How to Describe
Abstract
We present ARTrackV2 which integrates two pivotal aspects of tracking: determining where to look (localization) and how to describe (appearance analysis) the target object across video frames. Building on the foundation of its predecessor ARTrackV2 extends the concept by introducing a unified generative framework to "read out" object's trajectory and "retell" its appearance in an autoregressive manner. This approach fosters a time-continuous methodology that models the joint evolution of motion and visual features guided by previous estimates. Furthermore ARTrackV2 stands out for its efficiency and simplicity obviating the less efficient intra-frame autoregression and hand-tuned parameters for appearance updates. Despite its simplicity ARTrackV2 achieves state-of-the-art performance on prevailing benchmark datasets while demonstrating a remarkable efficiency improvement. In particular ARTrackV2 achieves an AO score of 79. 5% on GOT-10k and an AUC of 86. 1% on TrackingNet while being 3.6 xfaster than ARTrack.
Cite
Text
Bai et al. "ARTrackV2: Prompting Autoregressive Tracker Where to Look and How to Describe." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.01802Markdown
[Bai et al. "ARTrackV2: Prompting Autoregressive Tracker Where to Look and How to Describe." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/bai2024cvpr-artrackv2/) doi:10.1109/CVPR52733.2024.01802BibTeX
@inproceedings{bai2024cvpr-artrackv2,
title = {{ARTrackV2: Prompting Autoregressive Tracker Where to Look and How to Describe}},
author = {Bai, Yifan and Zhao, Zeyang and Gong, Yihong and Wei, Xing},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2024},
pages = {19048-19057},
doi = {10.1109/CVPR52733.2024.01802},
url = {https://mlanthology.org/cvpr/2024/bai2024cvpr-artrackv2/}
}