Video Instance Segmentation with a Propose-Reduce Paradigm
Abstract
Video instance segmentation (VIS) aims to segment and associate all instances of predefined classes for each frame in videos. Prior methods usually obtain segmentation for a frame or clip first, and merge the incomplete results by tracking or matching. These methods may cause error accumulation in the merging step. Contrarily, we propose a new paradigm -- Propose-Reduce, to generate complete sequences for input videos by a single step. We further build a sequence propagation head on the existing image-level instance segmentation network for long-term propagation. To ensure robustness and high recall of our proposed framework, multiple sequences are proposed where redundant sequences of the same instance are reduced. We achieve state-of-the-art performance on two representative benchmark datasets -- we obtain 47.6% in terms of AP on YouTube-VIS validation set and 70.4% for J&F on DAVIS-UVOS validation set.
Cite
Text
Lin et al. "Video Instance Segmentation with a Propose-Reduce Paradigm." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.00176Markdown
[Lin et al. "Video Instance Segmentation with a Propose-Reduce Paradigm." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/lin2021iccv-video/) doi:10.1109/ICCV48922.2021.00176BibTeX
@inproceedings{lin2021iccv-video,
title = {{Video Instance Segmentation with a Propose-Reduce Paradigm}},
author = {Lin, Huaijia and Wu, Ruizheng and Liu, Shu and Lu, Jiangbo and Jia, Jiaya},
booktitle = {International Conference on Computer Vision},
year = {2021},
pages = {1739-1748},
doi = {10.1109/ICCV48922.2021.00176},
url = {https://mlanthology.org/iccv/2021/lin2021iccv-video/}
}