Open-Vocabulary Panoptic Segmentation with Embedding Modulation

Abstract

Open-vocabulary segmentation is attracting increasing attention due to its critical applications in the real world. Traditional closed-vocabulary segmentation methods are not able to characterize novel objects, whereas several recent open-vocabulary attempts obtain unsatisfactory results, i.e., notable performance reduction on the closed-vocabulary and massive demand for extra training data. To this end, we propose OPSNet, an omnipotent and data-efficient framework for Open-vocabulary Panoptic Segmentation. Specifically, the exquisitely designed Embedding Modulation module, together with several meticulous components, enables adequate embedding enhancement and information exchange between the segmentation backbone and the visual-linguistic well-aligned CLIP encoder, resulting in superior segmentation performance under both open- and closed vocabulary settings and much fewer need of additional data. Extensive experimental evaluations are conducted across multiple datasets(e.g., COCO, ADE20K, Cityscapes, and PascalContext) under various circumstances, where the proposed OPSNet achieves state-of-the-art results, which demonstrates the effectiveness and generality of the proposed approach. The code and trained models will be made publicly available.

Cite

Text

Chen et al. "Open-Vocabulary Panoptic Segmentation with Embedding Modulation." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.00111

Markdown

[Chen et al. "Open-Vocabulary Panoptic Segmentation with Embedding Modulation." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/chen2023iccv-openvocabulary/) doi:10.1109/ICCV51070.2023.00111

BibTeX

@inproceedings{chen2023iccv-openvocabulary,
  title     = {{Open-Vocabulary Panoptic Segmentation with Embedding Modulation}},
  author    = {Chen, Xi and Li, Shuang and Lim, Ser-Nam and Torralba, Antonio and Zhao, Hengshuang},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {1141-1150},
  doi       = {10.1109/ICCV51070.2023.00111},
  url       = {https://mlanthology.org/iccv/2023/chen2023iccv-openvocabulary/}
}