Self-Supervised Any-Point Tracking by Contrastive Random Walks

Abstract

We present a simple, self-supervised approach to the Tracking Any Point (TAP) problem. We train a global matching transformer to find cycle consistent tracks through video via contrastive random walks, using the transformer’s attention-based global matching to define the transition matrices for a random walk on a space-time graph. The ability to perform “all pairs” comparisons between points allows the model to obtain high spatial precision and to obtain a strong contrastive learning signal, while avoiding many of the complexities of recent approaches (such as coarse-to-fine matching). To do this, we propose a number of design decisions that allow global matching architectures to be trained through self-supervision using cycle consistency. For example, we identify that transformer-based methods are sensitive to shortcut solutions, and propose a data augmentation scheme to address them. Our method achieves strong performance on the TapVid benchmarks, outperforming previous self-supervised tracking methods, such as DIFT, and is competitive with several supervised methods.

Cite

Text

Shrivastava and Owens. "Self-Supervised Any-Point Tracking by Contrastive Random Walks." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72630-9_16

Markdown

[Shrivastava and Owens. "Self-Supervised Any-Point Tracking by Contrastive Random Walks." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/shrivastava2024eccv-selfsupervised/) doi:10.1007/978-3-031-72630-9_16

BibTeX

@inproceedings{shrivastava2024eccv-selfsupervised,
  title     = {{Self-Supervised Any-Point Tracking by Contrastive Random Walks}},
  author    = {Shrivastava, Ayush and Owens, Andrew},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-72630-9_16},
  url       = {https://mlanthology.org/eccv/2024/shrivastava2024eccv-selfsupervised/}
}