Revisit the Open Nature of Open Vocabulary Semantic Segmentation
Abstract
In Open Vocabulary Semantic Segmentation (OVS), we observe a consistent drop in model performance as the query vocabulary set expands, especially when it includes semantically similar and ambiguous vocabularies, such as ‘sofa’ and ‘couch’. The previous OVS evaluation protocol, however, does not account for such ambiguity, as any mismatch between model-predicted and human-annotated pairs is simply treated as incorrect on a pixel-wise basis. This contradicts the open nature of OVS, where ambiguous categories may both be correct from an open- world perspective. To address this, in this work, we study the open nature of OVS and propose a mask-wise evaluation protocol that is based on matched and mis- matched mask pairs between prediction and annotation respectively. Extensive experimental evaluations show that the proposed mask-wise protocol provides a more effective and reliable evaluation framework for OVS models compared to the previous pixel-wise approach on the perspective of open-world. Moreover, analy- sis of mismatched mask pairs reveals that a large amount of ambiguous categories exist in commonly used OVS datasets. Interestingly, we find that reducing these ambiguities during both training and inference enhances capabilities of OVS mod- els. These findings and the new evaluation protocol encourage further exploration of the open nature of OVS, as well as broader open-world challenges. Project page: https://qiming-huang.github.io/RevisitOVS/.
Cite
Text
Huang et al. "Revisit the Open Nature of Open Vocabulary Semantic Segmentation." International Conference on Learning Representations, 2025.Markdown
[Huang et al. "Revisit the Open Nature of Open Vocabulary Semantic Segmentation." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/huang2025iclr-revisit/)BibTeX
@inproceedings{huang2025iclr-revisit,
title = {{Revisit the Open Nature of Open Vocabulary Semantic Segmentation}},
author = {Huang, Qiming and Hu, Han and Jiao, Jianbo},
booktitle = {International Conference on Learning Representations},
year = {2025},
url = {https://mlanthology.org/iclr/2025/huang2025iclr-revisit/}
}