Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation

Abstract

CLIP, as a vision-language model, has significantly advanced Open-Vocabulary Semantic Segmentation (OVSS) with its zero-shot capabilities. Despite its success, its application to OVSS faces challenges due to its initial image-level alignment training, which affects its performance in tasks requiring detailed local context. Our study delves into the impact of CLIP’s [CLS] token on patch feature correlations, revealing a dominance of ”global” patches that hinders local feature discrimination. To overcome this, we propose CLIPtrase, a novel training-free semantic segmentation strategy that enhances local feature awareness through recalibrated self-correlation among patches. This approach demonstrates notable improvements in segmentation accuracy and the ability to maintain semantic coherence across objects. Experiments show that we are 22.3% ahead of CLIP on average on 9 segmentation benchmarks, outperforming existing state-of-the-art training-free methods. The code are made publicly available at https://github.com/leaves162/CLIPtrase

Cite

Text

Shao et al. "Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73016-0_9

Markdown

[Shao et al. "Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/shao2024eccv-explore/) doi:10.1007/978-3-031-73016-0_9

BibTeX

@inproceedings{shao2024eccv-explore,
  title     = {{Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation}},
  author    = {Shao, Tong and Tian, Zhuotao and Zhao, Hang and Su, Jingyong},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-73016-0_9},
  url       = {https://mlanthology.org/eccv/2024/shao2024eccv-explore/}
}