SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation

Abstract

Single-stage instance segmentation approaches have recently gained popularity due to their speed and simplicity, but are still lagging behind in accuracy, compared to two-stage methods. We propose a fast single-stage instance segmentation method, called SipMask, that preserves instance-specific spatial information by separating mask prediction of an instance to different sub-regions of a detected bounding-box. Our main contribution is a novel light-weight spatial preservation (SP) module that generates a separate set of spatial coefficients for each sub-region within a bounding-box, leading to improved mask predictions. It also enables accurate delineation of spatially adjacent instances. Further, we introduce a mask alignment weighting loss and a feature alignment scheme to better correlate mask prediction with object detection. On COCO $ exttt{test-dev}$, our SipMask outperforms the existing single-stage methods. Compared to the state-of-the-art single-stage TensorMask, SipMask obtains an absolute gain of 1.0% (mask AP), while providing a four-fold speedup. In terms of real-time capabilities, SipMask outperforms YOLACT with an absolute gain of 3.0% (mask AP) under similar settings, while operating at comparable speed on a Titan Xp. We also evaluate our SipMask for real-time video instance segmentation, achieving promising results on YouTube-VIS dataset. The source code is available at https://github.com/JialeCao001/SipMask.

Cite

Text

Cao et al. "SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation." Proceedings of the European Conference on Computer Vision (ECCV), 2020. doi:10.1007/978-3-030-58568-6_1

Markdown

[Cao et al. "SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation." Proceedings of the European Conference on Computer Vision (ECCV), 2020.](https://mlanthology.org/eccv/2020/cao2020eccv-sipmask/) doi:10.1007/978-3-030-58568-6_1

BibTeX

@inproceedings{cao2020eccv-sipmask,
  title     = {{SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation}},
  author    = {Cao, Jiale and Anwer, Rao Muhammad and Cholakkal, Hisham and Khan, Fahad Shahbaz and Pang, Yanwei and Shao, Ling},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2020},
  doi       = {10.1007/978-3-030-58568-6_1},
  url       = {https://mlanthology.org/eccv/2020/cao2020eccv-sipmask/}
}