TAPIR: Tracking Any Point with Per-Frame Initialization and Temporal Refinement

Abstract

We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence. Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on local correlations. The resulting model surpasses all baseline methods by a significant margin on the TAP-Vid benchmark, as demonstrated by an approximate 20% absolute average Jaccard (AJ) improvement on DAVIS. Our model facilitates fast inference on long and high-resolution video sequences. On a modern GPU, our implementation has the capacity to track points faster than real-time. Given the high-quality trajectories extracted from a large dataset, we demonstrate a proof-of-concept diffusion model which generates trajectories from static images, enabling plausible animations. Visualizations, source code, and pretrained models can be found at https://deepmind-tapir.github.io.

Cite

Text

Doersch et al. "TAPIR: Tracking Any Point with Per-Frame Initialization and Temporal Refinement." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.00923

Markdown

[Doersch et al. "TAPIR: Tracking Any Point with Per-Frame Initialization and Temporal Refinement." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/doersch2023iccv-tapir/) doi:10.1109/ICCV51070.2023.00923

BibTeX

@inproceedings{doersch2023iccv-tapir,
  title     = {{TAPIR: Tracking Any Point with Per-Frame Initialization and Temporal Refinement}},
  author    = {Doersch, Carl and Yang, Yi and Vecerik, Mel and Gokay, Dilara and Gupta, Ankush and Aytar, Yusuf and Carreira, Joao and Zisserman, Andrew},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {10061-10072},
  doi       = {10.1109/ICCV51070.2023.00923},
  url       = {https://mlanthology.org/iccv/2023/doersch2023iccv-tapir/}
}