Simple Cues Lead to a Strong Multi-Object Tracker

Abstract

For a long time, the most common paradigm in MultiObject Tracking was tracking-by-detection (TbD), where objects are first detected and then associated over video frames. For association, most models resourced to motion and appearance cues, e.g., re-identification networks. Recent approaches based on attention propose to learn the cues in a data-driven manner, showing impressive results. In this paper, we ask ourselves whether simple good old TbD methods are also capable of achieving the performance of end-to-end models. To this end, we propose two key ingredients that allow a standard re-identification network to excel at appearance-based tracking. We extensively analyse its failure cases, and show that a combination of our appearance features with a simple motion model leads to strong tracking results. Our tracker generalizes to four public datasets, namely MOT17, MOT20, BDD100k, and DanceTrack, achieving state-ofthe-art performance. https://github.com/dvl-tum/GHOST

Cite

Text

Seidenschwarz et al. "Simple Cues Lead to a Strong Multi-Object Tracker." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.01327

Markdown

[Seidenschwarz et al. "Simple Cues Lead to a Strong Multi-Object Tracker." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/seidenschwarz2023cvpr-simple/) doi:10.1109/CVPR52729.2023.01327

BibTeX

@inproceedings{seidenschwarz2023cvpr-simple,
  title     = {{Simple Cues Lead to a Strong Multi-Object Tracker}},
  author    = {Seidenschwarz, Jenny and Brasó, Guillem and Serrano, Víctor Castro and Elezi, Ismail and Leal-Taixé, Laura},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {13813-13823},
  doi       = {10.1109/CVPR52729.2023.01327},
  url       = {https://mlanthology.org/cvpr/2023/seidenschwarz2023cvpr-simple/}
}