DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

Abstract

Binary grid mask representation is broadly used in instance segmentation. A representative instantiation is Mask R-CNN which predicts masks on a 28*28 binary grid. Generally, a low-resolution grid is not sufficient to capture the details, while a high-resolution grid dramatically increases the training complexity. In this paper, we propose a new mask representation by applying the discrete cosine transform(DCT) to encode the high-resolution binary grid mask into a compact vector. Our method, termed DCT-Mask, could be easily integrated into most pixel-based instance segmentation methods. Without any bells and whistles, DCT-Mask yields significant gains on different frameworks, backbones, datasets, and training schedules. It does not require any pre-processing or pre-training, and almost no harm to the running speed. Especially, for higher-quality annotations and more complex backbones, our method has a greater improvement. Moreover, we analyze the performance of our method from the perspective of the quality of mask representation. The main reason why DCT-Mask works well is that it obtains a high-quality mask representation with low complexity.

Cite

Text

Shen et al. "DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation." Conference on Computer Vision and Pattern Recognition, 2021. doi:10.1109/CVPR46437.2021.00861

Markdown

[Shen et al. "DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation." Conference on Computer Vision and Pattern Recognition, 2021.](https://mlanthology.org/cvpr/2021/shen2021cvpr-dctmask/) doi:10.1109/CVPR46437.2021.00861

BibTeX

@inproceedings{shen2021cvpr-dctmask,
  title     = {{DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation}},
  author    = {Shen, Xing and Yang, Jirui and Wei, Chunbo and Deng, Bing and Huang, Jianqiang and Hua, Xian-Sheng and Cheng, Xiaoliang and Liang, Kewei},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2021},
  pages     = {8720-8729},
  doi       = {10.1109/CVPR46437.2021.00861},
  url       = {https://mlanthology.org/cvpr/2021/shen2021cvpr-dctmask/}
}