An Empirical Study of Detection-Based Video Instance Segmentation
Abstract
Video instance segmentation (VIS) is a composite task that requires the joint detection, tracking, and segmentation of objects in a video. In this work, we introduce a complete framework for VIS, which integrates the strengths of instance segmentation and general object tracking in addressing the unique challenges of VIS. In developing the framework, we investigate effective ways of coordinating the two components for maximum benefits while thoroughly investigate their separate contributions. Our approach improves over the official baseline by an absolute 14.4% in mAP and achieves the second place in the 2019 YouTubeVIS challenge.
Cite
Text
Wang et al. "An Empirical Study of Detection-Based Video Instance Segmentation." IEEE/CVF International Conference on Computer Vision Workshops, 2019. doi:10.1109/ICCVW.2019.00089Markdown
[Wang et al. "An Empirical Study of Detection-Based Video Instance Segmentation." IEEE/CVF International Conference on Computer Vision Workshops, 2019.](https://mlanthology.org/iccvw/2019/wang2019iccvw-empirical/) doi:10.1109/ICCVW.2019.00089BibTeX
@inproceedings{wang2019iccvw-empirical,
title = {{An Empirical Study of Detection-Based Video Instance Segmentation}},
author = {Wang, Qiang and He, Yi and Yang, Xiaoyun and Yang, Zhao and Torr, Philip H. S.},
booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
year = {2019},
pages = {713-716},
doi = {10.1109/ICCVW.2019.00089},
url = {https://mlanthology.org/iccvw/2019/wang2019iccvw-empirical/}
}