Contrastive Learning for Multi-Object Tracking with Transformers

Abstract

The DEtection TRansformer (DETR) opened new possibilities for object detection by modeling it as a translation task: converting image features into object-level representations. Previous works typically add expensive modules to DETR to perform Multi-Object Tracking (MOT), resulting in more complicated architectures. We instead show how DETR can be turned into a MOT model by employing an instance-level contrastive loss, a revised sampling strategy and a lightweight assignment method. Our training scheme learns object appearances while preserving detection capabilities and with little overhead. Its performance surpasses the previous state-of-the-art by +2.6 mMOTA on the challenging BDD100K dataset and is comparable to existing transformer-based methods on the MOT17 dataset.

Cite

Text

De Plaen et al. "Contrastive Learning for Multi-Object Tracking with Transformers." Winter Conference on Applications of Computer Vision, 2024.

Markdown

[De Plaen et al. "Contrastive Learning for Multi-Object Tracking with Transformers." Winter Conference on Applications of Computer Vision, 2024.](https://mlanthology.org/wacv/2024/plaen2024wacv-contrastive/)

BibTeX

@inproceedings{plaen2024wacv-contrastive,
  title     = {{Contrastive Learning for Multi-Object Tracking with Transformers}},
  author    = {De Plaen, Pierre-François and Marinello, Nicola and Proesmans, Marc and Tuytelaars, Tinne and Van Gool, Luc},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2024},
  pages     = {6867-6877},
  url       = {https://mlanthology.org/wacv/2024/plaen2024wacv-contrastive/}
}