Spatial-Temporal Relation Networks for Multi-Object Tracking

Abstract

Recent progress in multiple object tracking (MOT) has shown that a robust similarity score is a key to the success of trackers. A good similarity score is expected to reflect multiple cues, e.g. appearance, location, and topology, over a long period of time. However, these cues are heterogeneous, making them hard to be combined in a unified network. As a result, existing methods usually encode them in separate networks or require a complex training approach. In this paper, we present a unified framework for similarity measurement based on spatial-temporal relation network which could simultaneously encode various cues and perform reasoning across both spatial and temporal domains. We also study the feature representation of a tracklet-object pair in depth, showing a proper design of the pair features can well empower the trackers. The resulting approach is named spatial-temporal relation networks (STRN). It runs in a feed-forward way and can be trained in an end-to-end manner. The state-of-the-art accuracy was achieved on all of the MOT15~17 benchmarks using public detection and online settings.

Cite

Text

Xu et al. "Spatial-Temporal Relation Networks for Multi-Object Tracking." Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019. doi:10.1109/ICCV.2019.00409

Markdown

[Xu et al. "Spatial-Temporal Relation Networks for Multi-Object Tracking." Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.](https://mlanthology.org/iccv/2019/xu2019iccv-spatialtemporal/) doi:10.1109/ICCV.2019.00409

BibTeX

@inproceedings{xu2019iccv-spatialtemporal,
  title     = {{Spatial-Temporal Relation Networks for Multi-Object Tracking}},
  author    = {Xu, Jiarui and Cao, Yue and Zhang, Zheng and Hu, Han},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year      = {2019},
  doi       = {10.1109/ICCV.2019.00409},
  url       = {https://mlanthology.org/iccv/2019/xu2019iccv-spatialtemporal/}
}