RPT: Learning Point Set Representation for Siamese Visual Tracking

Abstract

While remarkable progress has been made in robust visual tracking, accurate target state estimation still remains a highly challenging problem. In this paper, we argue that this issue is closely related to the prevalent bounding box representation, which provides only a coarse spatial extent of object. Thus an effcient visual tracking framework is proposed to accurately estimate the target state with a finer representation as a set of representative points. The point set is trained to indicate the semantically and geometrically significant positions of target region, enabling more fine-grained localization and modeling of object appearance. We further propose a multi-level aggregation strategy to obtain detailed structure information by fusing hierarchical convolution layers. Extensive experiments on several challenging benchmarks including OTB2015, VOT2018, VOT2019 and GOT-10k demonstrate that our method achieves new state-of-the-art performance while running at over 20 FPS.

Cite

Text

Ma et al. "RPT: Learning Point Set Representation for Siamese Visual Tracking." European Conference on Computer Vision Workshops, 2020. doi:10.1007/978-3-030-68238-5_43

Markdown

[Ma et al. "RPT: Learning Point Set Representation for Siamese Visual Tracking." European Conference on Computer Vision Workshops, 2020.](https://mlanthology.org/eccvw/2020/ma2020eccvw-rpt/) doi:10.1007/978-3-030-68238-5_43

BibTeX

@inproceedings{ma2020eccvw-rpt,
  title     = {{RPT: Learning Point Set Representation for Siamese Visual Tracking}},
  author    = {Ma, Ziang and Wang, Linyuan and Zhang, Haitao and Lu, Wei and Yin, Jun},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2020},
  pages     = {653-665},
  doi       = {10.1007/978-3-030-68238-5_43},
  url       = {https://mlanthology.org/eccvw/2020/ma2020eccvw-rpt/}
}