Dual Embedding Learning for Video Instance Segmentation

Abstract

In this paper, we propose a novel framework to generate high-quality segmentation results in a two-stage style, aiming at video instance segmentation task which requires simultaneous detection, segmentation and tracking of instances. To address this multi-task efficiently, we opt to first select high-quality detection proposals in each frame. The categories of the proposals are calibrated with the global context of video. Then, each selected proposal is extended temporally by a bi-directional Instance-Pixel Dual-Tracker (IPDT) which synchronizes the tracking on both instance-level and pixel-level. The instance-level module concentrates on distinguishing the target instance from other objects while the pixel-level module focuses more on the local feature of the instance. Our proposed method achieved a competitive result of mAP 45.0% on the Youtube-VOS dataset, ranking the 3rd in Track 2 of the 2nd Large-scale Video Object Segmentation Challenge.

Cite

Text

Feng et al. "Dual Embedding Learning for Video Instance Segmentation." IEEE/CVF International Conference on Computer Vision Workshops, 2019. doi:10.1109/ICCVW.2019.00090

Markdown

[Feng et al. "Dual Embedding Learning for Video Instance Segmentation." IEEE/CVF International Conference on Computer Vision Workshops, 2019.](https://mlanthology.org/iccvw/2019/feng2019iccvw-dual/) doi:10.1109/ICCVW.2019.00090

BibTeX

@inproceedings{feng2019iccvw-dual,
  title     = {{Dual Embedding Learning for Video Instance Segmentation}},
  author    = {Feng, Qianyu and Yang, Zongxin and Li, Peike and Wei, Yunchao and Yang, Yi},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2019},
  pages     = {717-720},
  doi       = {10.1109/ICCVW.2019.00090},
  url       = {https://mlanthology.org/iccvw/2019/feng2019iccvw-dual/}
}