Learning Dynamic Siamese Network for Visual Object Tracking

Abstract

How to effectively learn temporal variation of target appearance, to exclude the interference of cluttered background, while maintaining real-time response, is an essential problem of visual object tracking. Recently, Siamese networks have shown great potentials of matching based trackers in achieving balanced accuracy and beyond real-time speed. However, they still have a big gap to classification & updating based trackers in tolerating the temporal changes of objects and imaging conditions. In this paper, we propose dynamic Siamese network, via a fast transformation learning model that enables effective online learning of target appearance variation and background suppression from previous frames. We then present elementwise multi-layer fusion to adaptively integrate the network outputs using multi-level deep features. Unlike state-of-the-art trackers, our approach allows the usage of any feasible generally- or particularly-trained features, such as SiamFC and VGG. More importantly, the proposed dynamic Siamese network can be jointly trained as a whole directly on the labeled video sequences, thus can take full advantage of the rich spatial temporal information of moving objects. As a result, our approach achieves state-of-the-art performance on OTB-2013 and VOT-2015 benchmarks, while exhibits superiorly balanced accuracy and real-time response over state-of-the-art competitors.

Cite

Text

Guo et al. "Learning Dynamic Siamese Network for Visual Object Tracking." International Conference on Computer Vision, 2017. doi:10.1109/ICCV.2017.196

Markdown

[Guo et al. "Learning Dynamic Siamese Network for Visual Object Tracking." International Conference on Computer Vision, 2017.](https://mlanthology.org/iccv/2017/guo2017iccv-learning/) doi:10.1109/ICCV.2017.196

BibTeX

@inproceedings{guo2017iccv-learning,
  title     = {{Learning Dynamic Siamese Network for Visual Object Tracking}},
  author    = {Guo, Qing and Feng, Wei and Zhou, Ce and Huang, Rui and Wan, Liang and Wang, Song},
  booktitle = {International Conference on Computer Vision},
  year      = {2017},
  doi       = {10.1109/ICCV.2017.196},
  url       = {https://mlanthology.org/iccv/2017/guo2017iccv-learning/}
}