AF2S: An Anchor-Free Two-Stage Tracker Based on a Strong SiamFC Baseline
Abstract
Siamese network based trackers have become a mainstream in visual object tracking. Recently, several high-performance multi-stage trackers have been proposed and some of them adopt SiamRPN for the first-stage region proposal. We argue that an anchor-based region proposal network is not necessary for the tracking task, as a tracker has a strong prior about the location and size of the target. In this paper, we propose a two-stage visual tracker which uses SiamFC for region proposal. SiamFC defines a bounding box by its center, which is a typical anchor-free (AF) network, so we dub our tracker AF2S. As the model size of SiamFC is only about 1/10 that of SiamRPN, AF2S results in a significantly lighter model than its SiamRPN-based counterparts. In the design of AF2S, we first build a strong AlexNet-based SiamFC baseline which improves the AUC on OTB-100 from 0.582 to 0.665. Further, we propose a position-sensitive convolutional layer which can be stacked after SiamFC backbone to increase the robustness of proposals without losing localization precision. Finally, a relation network is used for box refinement. Experimental results show that AF2S achieves the best performance on OTB-100 and VOT-18 among the state-of-the-art trackers which use AlexNet as backbone. On LaSOT-test, AF2S achieves an AUC of 0.480, which is among the first-tier performance even when trackers with more powerful backbone and much larger model size are considered.
Cite
Text
He et al. "AF2S: An Anchor-Free Two-Stage Tracker Based on a Strong SiamFC Baseline." European Conference on Computer Vision Workshops, 2020. doi:10.1007/978-3-030-68238-5_42Markdown
[He et al. "AF2S: An Anchor-Free Two-Stage Tracker Based on a Strong SiamFC Baseline." European Conference on Computer Vision Workshops, 2020.](https://mlanthology.org/eccvw/2020/he2020eccvw-af2s/) doi:10.1007/978-3-030-68238-5_42BibTeX
@inproceedings{he2020eccvw-af2s,
title = {{AF2S: An Anchor-Free Two-Stage Tracker Based on a Strong SiamFC Baseline}},
author = {He, Anfeng and Wang, Guangting and Luo, Chong and Tian, Xinmei and Zeng, Wenjun},
booktitle = {European Conference on Computer Vision Workshops},
year = {2020},
pages = {637-652},
doi = {10.1007/978-3-030-68238-5_42},
url = {https://mlanthology.org/eccvw/2020/he2020eccvw-af2s/}
}