CoTracker: It Is Better to Track Together

Abstract

We introduce , a transformer-based model that tracks a large number of 2D points in long video sequences. Differently from most existing approaches that track points independently, tracks them jointly, accounting for their dependencies. We show that joint tracking significantly improves tracking accuracy and robustness, and allows to track occluded points and points outside of the camera view. We also introduce several innovations for this class of trackers, including using token proxies that significantly improve memory efficiency and allow to track 70k points jointly and simultaneously at inference on a single GPU. is an online algorithm that operates causally on short windows. However, it is trained utilizing unrolled windows as a recurrent network, maintaining tracks for long periods of time even when points are occluded or leave the field of view. Quantitatively, substantially outperforms prior trackers on standard point-tracking benchmarks. Code and model weights are available at https://co-tracker.github.io/

Cite

Text

Karaev et al. "CoTracker: It Is Better to Track Together." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73033-7_2

Markdown

[Karaev et al. "CoTracker: It Is Better to Track Together." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/karaev2024eccv-cotracker/) doi:10.1007/978-3-031-73033-7_2

BibTeX

@inproceedings{karaev2024eccv-cotracker,
  title     = {{CoTracker: It Is Better to Track Together}},
  author    = {Karaev, Nikita and Rocco, Ignacio and Graham, Ben and Neverova, Natalia and Vedaldi, Andrea and Rupprecht, Christian},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-73033-7_2},
  url       = {https://mlanthology.org/eccv/2024/karaev2024eccv-cotracker/}
}