Video Instance Segmentation 2019: A Winning Approach for Combined Detection, Segmentation, Classification and Tracking
Abstract
Video Instance Segmentation (VIS) is the task of localizing all objects in a video, segmenting them, tracking them throughout the video and classifying them into a set of predefined classes. In this work, divide VIS into these four parts: detection, segmentation, tracking and classification. We then develop algorithms for performing each of these four sub tasks individually, and combine these into a complete solution for VIS. Our solution is an adaptation of UnOVOST, the current best performing algorithm for Unsupervised Video Object Segmentation, to this VIS task. We benchmark our algorithm on the 2019 YouTube-VIS Challenge, where we obtain first place with an mAP score of 46.7%.
Cite
Text
Luiten et al. "Video Instance Segmentation 2019: A Winning Approach for Combined Detection, Segmentation, Classification and Tracking." IEEE/CVF International Conference on Computer Vision Workshops, 2019. doi:10.1109/ICCVW.2019.00088Markdown
[Luiten et al. "Video Instance Segmentation 2019: A Winning Approach for Combined Detection, Segmentation, Classification and Tracking." IEEE/CVF International Conference on Computer Vision Workshops, 2019.](https://mlanthology.org/iccvw/2019/luiten2019iccvw-video/) doi:10.1109/ICCVW.2019.00088BibTeX
@inproceedings{luiten2019iccvw-video,
title = {{Video Instance Segmentation 2019: A Winning Approach for Combined Detection, Segmentation, Classification and Tracking}},
author = {Luiten, Jonathon and Torr, Philip H. S. and Leibe, Bastian},
booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
year = {2019},
pages = {709-712},
doi = {10.1109/ICCVW.2019.00088},
url = {https://mlanthology.org/iccvw/2019/luiten2019iccvw-video/}
}