TAPTRv2: Attention-Based Position Update Improves Tracking Any Point

Abstract

In this paper, we present TAPTRv2, a Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DEtection TRansformer (DETR) and formulates each tracking point as a point query, making it possible to leverage well-studied operations in DETR-like algorithms. TAPTRv2 improves TAPTR by addressing a critical issue regarding its reliance on cost-volume, which contaminates the point query’s content feature and negatively impacts both visibility prediction and cost-volume computation. In TAPTRv2, we propose a novel attention-based position update (APU) operation and use key-aware deformable attention to realize. For each query, this operation uses key-aware attention weights to combine their corresponding deformable sampling positions to predict a new query position. This design is based on the observation that local attention is essentially the same as cost-volume, both of which are computed by dot-production between a query and its surrounding features. By introducing this new operation, TAPTRv2 not only removes the extra burden of cost-volume computation, but also leads to a substantial performance improvement. TAPTRv2 surpasses TAPTR and achieves state-of-the-art performance on many challenging datasets, demonstrating the effectiveness of our approach.

Cite

Text

Li et al. "TAPTRv2: Attention-Based Position Update Improves Tracking Any Point." Neural Information Processing Systems, 2024. doi:10.52202/079017-3205

Markdown

[Li et al. "TAPTRv2: Attention-Based Position Update Improves Tracking Any Point." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/li2024neurips-taptrv2/) doi:10.52202/079017-3205

BibTeX

@inproceedings{li2024neurips-taptrv2,
  title     = {{TAPTRv2: Attention-Based Position Update Improves Tracking Any Point}},
  author    = {Li, Hongyang and Zhang, Hao and Liu, Shilong and Zeng, Zhaoyang and Li, Feng and Ren, Tianhe and Li, Bohan and Zhang, Lei},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-3205},
  url       = {https://mlanthology.org/neurips/2024/li2024neurips-taptrv2/}
}