An Empirical Study of Detection-Based Video Instance Segmentation

Abstract

Video instance segmentation (VIS) is a composite task that requires the joint detection, tracking, and segmentation of objects in a video. In this work, we introduce a complete framework for VIS, which integrates the strengths of instance segmentation and general object tracking in addressing the unique challenges of VIS. In developing the framework, we investigate effective ways of coordinating the two components for maximum benefits while thoroughly investigate their separate contributions. Our approach improves over the official baseline by an absolute 14.4% in mAP and achieves the second place in the 2019 YouTubeVIS challenge.

Cite

Text

Wang et al. "An Empirical Study of Detection-Based Video Instance Segmentation." IEEE/CVF International Conference on Computer Vision Workshops, 2019. doi:10.1109/ICCVW.2019.00089

Markdown

[Wang et al. "An Empirical Study of Detection-Based Video Instance Segmentation." IEEE/CVF International Conference on Computer Vision Workshops, 2019.](https://mlanthology.org/iccvw/2019/wang2019iccvw-empirical/) doi:10.1109/ICCVW.2019.00089

BibTeX

@inproceedings{wang2019iccvw-empirical,
  title     = {{An Empirical Study of Detection-Based Video Instance Segmentation}},
  author    = {Wang, Qiang and He, Yi and Yang, Xiaoyun and Yang, Zhao and Torr, Philip H. S.},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2019},
  pages     = {713-716},
  doi       = {10.1109/ICCVW.2019.00089},
  url       = {https://mlanthology.org/iccvw/2019/wang2019iccvw-empirical/}
}