Local All-Pair Correspondence for Point Tracking

Abstract

We introduce , a highly accurate and efficient model designed for the task of tracking any point (TAP) across video sequences. Previous approaches in this task often rely on local 2D correlation maps to establish correspondences from a point in the query image to a local region in the target image, which often struggle with homogeneous regions or repetitive features, leading to matching ambiguities. overcomes this challenge with a novel approach that utilizes all-pair correspondences across regions, , local 4D correlation, to establish precise correspondences, with bidirectional correspondence and matching smoothness significantly enhancing robustness against ambiguities. We also incorporate a lightweight correlation encoder to enhance computational efficiency, and a compact Transformer architecture to integrate long-term temporal information. achieves unmatched accuracy on all TAP-Vid benchmarks and operates at a speed almost 6× faster than the current state-of-the-art.

Cite

Text

Cho et al. "Local All-Pair Correspondence for Point Tracking." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72684-2_18

Markdown

[Cho et al. "Local All-Pair Correspondence for Point Tracking." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/cho2024eccv-local/) doi:10.1007/978-3-031-72684-2_18

BibTeX

@inproceedings{cho2024eccv-local,
  title     = {{Local All-Pair Correspondence for Point Tracking}},
  author    = {Cho, Seokju and Huang, Jiahui and Nam, Jisu and An, Honggyu and Kim, Seungryong and Lee, Joon-Young},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-72684-2_18},
  url       = {https://mlanthology.org/eccv/2024/cho2024eccv-local/}
}