Progressive Unsupervised Learning for Visual Object Tracking

Abstract

In this paper, we propose a progressive unsupervised learning (PUL) framework, which entirely removes the need for annotated training videos in visual tracking. Specifically, we first learn a background discrimination (BD) model that effectively distinguishes an object from background in a contrastive learning way. We then employ the BD model to progressively mine temporal corresponding patches (i.e., patches connected by a track) in sequential frames. As the BD model is imperfect and thus the mined patch pairs are noisy, we propose a noise-robust loss function to more effectively learn temporal correspondences from this noisy data. We use the proposed noise robust loss to train backbone networks of Siamese trackers. Without online fine-tuning or adaptation, our unsupervised real-time Siamese trackers can outperform state-of-the-art unsupervised deep trackers and achieve competitive results to the supervised baselines.

Cite

Text

Wu et al. "Progressive Unsupervised Learning for Visual Object Tracking." Conference on Computer Vision and Pattern Recognition, 2021. doi:10.1109/CVPR46437.2021.00301

Markdown

[Wu et al. "Progressive Unsupervised Learning for Visual Object Tracking." Conference on Computer Vision and Pattern Recognition, 2021.](https://mlanthology.org/cvpr/2021/wu2021cvpr-progressive/) doi:10.1109/CVPR46437.2021.00301

BibTeX

@inproceedings{wu2021cvpr-progressive,
  title     = {{Progressive Unsupervised Learning for Visual Object Tracking}},
  author    = {Wu, Qiangqiang and Wan, Jia and Chan, Antoni B.},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2021},
  pages     = {2993-3002},
  doi       = {10.1109/CVPR46437.2021.00301},
  url       = {https://mlanthology.org/cvpr/2021/wu2021cvpr-progressive/}
}