MOTS: Multi-Object Tracking and Segmentation

Abstract

This paper extends the popular task of multi-object tracking to multi-object tracking and segmentation (MOTS). Towards this goal, we create dense pixel-level annotations for two existing tracking datasets using a semi-automatic annotation procedure. Our new annotations comprise 65,213 pixel masks for 977 distinct objects (cars and pedestrians) in 10,870 video frames. For evaluation, we extend existing multi-object tracking metrics to this new task. Moreover, we propose a new baseline method which jointly addresses detection, tracking, and segmentation with a single convolutional network. We demonstrate the value of our datasets by achieving improvements in performance when training on MOTS annotations. We believe that our datasets, metrics and baseline will become a valuable resource towards developing multi-object tracking approaches that go beyond 2D bounding boxes. We make our annotations, code, and models available at https://www.vision.rwth-aachen.de/page/mots.

Cite

Text

Voigtlaender et al. "MOTS: Multi-Object Tracking and Segmentation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019. doi:10.1109/CVPR.2019.00813

Markdown

[Voigtlaender et al. "MOTS: Multi-Object Tracking and Segmentation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.](https://mlanthology.org/cvpr/2019/voigtlaender2019cvpr-mots/) doi:10.1109/CVPR.2019.00813

BibTeX

@inproceedings{voigtlaender2019cvpr-mots,
  title     = {{MOTS: Multi-Object Tracking and Segmentation}},
  author    = {Voigtlaender, Paul and Krause, Michael and Osep, Aljosa and Luiten, Jonathon and Sekar, Berin Balachandar Gnana and Geiger, Andreas and Leibe, Bastian},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2019},
  doi       = {10.1109/CVPR.2019.00813},
  url       = {https://mlanthology.org/cvpr/2019/voigtlaender2019cvpr-mots/}
}