MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers

Wang, Huiyu; Zhu, Yukun; Adam, Hartwig; Yuille, Alan; Chen, Liang-Chieh

doi:10.1109/CVPR46437.2021.00542

MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers

Huiyu Wang, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen

CVPR 2021 pp. 5463-5474

doi:10.1109/CVPR46437.2021.00542 /cvpr/2021/wang2021cvpr-maxdeeplab/

Abstract

We present MaX-DeepLab, the first end-to-end model for panoptic segmentation. Our approach simplifies the current pipeline that depends heavily on surrogate sub-tasks and hand-designed components, such as box detection, non-maximum suppression, thing-stuff merging, etc. Although these sub-tasks are tackled by area experts, they fail to comprehensively solve the target task. By contrast, our MaX-DeepLab directly predicts class-labeled masks with a mask transformer, and is trained with a panoptic quality inspired loss via bipartite matching. Our mask transformer employs a dual-path architecture that introduces a global memory path in addition to a CNN path, allowing direct communication with any CNN layers. As a result, MaX-DeepLab shows a significant 7.1% PQ gain in the box-free regime on the challenging COCO dataset, closing the gap between box-based and box-free methods for the first time. A small variant of MaX-DeepLab improves 3.0% PQ over DETR with similar parameters and M-Adds. Furthermore, MaX-DeepLab, without test time augmentation, achieves new state-of-the-art 51.3% PQ on COCO test-dev set.

PDF CVPR Semantic Scholar

Cite

Text

Wang et al. "MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers." Conference on Computer Vision and Pattern Recognition, 2021. doi:10.1109/CVPR46437.2021.00542

Markdown

[Wang et al. "MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers." Conference on Computer Vision and Pattern Recognition, 2021.](https://mlanthology.org/cvpr/2021/wang2021cvpr-maxdeeplab/) doi:10.1109/CVPR46437.2021.00542

BibTeX

@inproceedings{wang2021cvpr-maxdeeplab,
  title     = {{MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers}},
  author    = {Wang, Huiyu and Zhu, Yukun and Adam, Hartwig and Yuille, Alan and Chen, Liang-Chieh},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2021},
  pages     = {5463-5474},
  doi       = {10.1109/CVPR46437.2021.00542},
  url       = {https://mlanthology.org/cvpr/2021/wang2021cvpr-maxdeeplab/}
}