Class Distribution-Induced Attention mAP for Open-Vocabulary Semantic Segmentations

Abstract

Open-vocabulary semantic segmentation is a challenging task that assigns seen or unseen class labels to individual pixels. While recent works with vision-language models (VLMs) have shown promising results in zero-shot semantic segmentation, they still struggle to accurately localize class-related objects. In this work, we argue that CLIP-based prior works yield patch-wise noisy class predictions while having highly correlated class distributions for each object. Then, we propose Class Distribution-induced Attention Map, dubbed CDAM, that is generated by the Jensen-Shannon divergence between class distributions of two patches that belong to the same (class) object. This CDAM can be used for open-vocabulary semantic segmentation by integrating it into the final layer of CLIP to enhance the capability to accurately localize desired classes. Our class distribution-induced attention scheme can easily work with multi-scale image patches as well as augmented text prompts for further enhancing attention maps. By exploiting class distribution, we also propose robust entropy-based background thresholding for the inference of semantic segmentation. Interestingly, the core idea of our proposed method does not conflict with other prior arts in zero-shot semantic segmentation, thus can be synergetically used together, yielding substantial improvements in performance across popular semantic segmentation benchmarks.

Cite

Text

Kang et al. "Class Distribution-Induced Attention mAP for Open-Vocabulary Semantic Segmentations." International Conference on Learning Representations, 2025.

Markdown

[Kang et al. "Class Distribution-Induced Attention mAP for Open-Vocabulary Semantic Segmentations." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/kang2025iclr-class/)

BibTeX

@inproceedings{kang2025iclr-class,
  title     = {{Class Distribution-Induced Attention mAP for Open-Vocabulary Semantic Segmentations}},
  author    = {Kang, Dong Un and Kim, Hayeon and Chun, Se Young},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/kang2025iclr-class/}
}