Towards Sequence-Level Training for Visual Tracking

Abstract

Despite the extensive adoption of machine learning on the task of visual object tracking, recent learning-based approaches have largely overlooked the fact that visual tracking is a sequence-level task in its nature; they rely heavily on frame-level training, which inevitably induces inconsistency between training and testing in terms of both data distributions and task objectives. This work introduces a sequence-level training strategy for visual tracking based on reinforcement learning and discusses how a sequence-level design of data sampling, learning objectives, and data augmentation can improve the accuracy and robustness of tracking algorithms. Our experiments on standard benchmarks including LaSOT, TrackingNet, and GOT-10k demonstrate that four representative tracking models, SiamRPN++, SiamAttn, TransT, and TrDiMP, consistently improve by incorporating the proposed methods in training without modifying architectures.

Cite

Text

Kim et al. "Towards Sequence-Level Training for Visual Tracking." Proceedings of the European Conference on Computer Vision (ECCV), 2022. doi:10.1007/978-3-031-20047-2_31

Markdown

[Kim et al. "Towards Sequence-Level Training for Visual Tracking." Proceedings of the European Conference on Computer Vision (ECCV), 2022.](https://mlanthology.org/eccv/2022/kim2022eccv-sequencelevel/) doi:10.1007/978-3-031-20047-2_31

BibTeX

@inproceedings{kim2022eccv-sequencelevel,
  title     = {{Towards Sequence-Level Training for Visual Tracking}},
  author    = {Kim, Minji and Lee, Seungkwan and Ok, Jungseul and Han, Bohyung and Cho, Minsu},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2022},
  doi       = {10.1007/978-3-031-20047-2_31},
  url       = {https://mlanthology.org/eccv/2022/kim2022eccv-sequencelevel/}
}