Decoupled Spatio-Temporal Consistency Learning for Self-Supervised Tracking

Abstract

The success of visual tracking has been largely driven by datasets with manual box annotations. However, these box annotations require tremendous human effort, limiting the scale and diversity of existing tracking datasets. In this work, we present a novel Self-Supervised Tracking framework, named SSTrack, designed to eliminate the need of box annotations. Specifically, a decoupled spatio-temporal consistency training framework is proposed to learn rich target information across timestamps through global spatial localization and local temporal association. This allows for the simulation of appearance and motion variations of instances in real-world scenarios. Furthermore, an instance contrastive loss is designed to learn instance-level correspondences from a multi-view perspective, offering robust instance supervision without additional labels. This new design paradigm enables SSTrack to effectively learn generic tracking representations in a self-supervised manner, while reducing reliance on extensive box annotations. Extensive experiments on nine benchmark datasets demonstrate that SSTrack surpasses SOTA self-supervised tracking methods, achieving an improvement of more than 25.3%, 20.4%, and 14.8% in AUC (AO) score on the GOT10K, LaSOT, TrackingNet datasets, respectively.

Cite

Text

Zheng et al. "Decoupled Spatio-Temporal Consistency Learning for Self-Supervised Tracking." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I10.33155

Markdown

[Zheng et al. "Decoupled Spatio-Temporal Consistency Learning for Self-Supervised Tracking." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/zheng2025aaai-decoupled/) doi:10.1609/AAAI.V39I10.33155

BibTeX

@inproceedings{zheng2025aaai-decoupled,
  title     = {{Decoupled Spatio-Temporal Consistency Learning for Self-Supervised Tracking}},
  author    = {Zheng, Yaozong and Zhong, Bineng and Liang, Qihua and Li, Ning and Song, Shuxiang},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {10635-10643},
  doi       = {10.1609/AAAI.V39I10.33155},
  url       = {https://mlanthology.org/aaai/2025/zheng2025aaai-decoupled/}
}