TFM^2: Training-Free Mask Matching for Open-Vocabulary Semantic Segmentation

Abstract

The potential of Open-Vocabulary Semantic Segmentation (OVSS) in few-shot scenarios is not fully explored due to the complexity of extending few-shot concepts to semantic segmentation tasks. To address this challenge we propose Training-Free Mask Matching (TFM^2) an efficient mask-based adapter method that enhances OVSS models for the few-shot open vocabulary semantic segmentation task. TFM^2 is a key-value cache that explicitly designed for image masks. We introduce three modules to construct and refine the mask cache subsequently enhancing the OVSS mask classification performance. Comprehensive experiments demonstrate that TFM^2 improves the performance of state-of-the-art OVSS methods by a margin of 1% to 5% across different settings. Moreover TFM^2 is not limited to any specific methods or backbones. This work underscores the importance and potential of few-shot data in OVSS and presents a significant step toward leveraging this potential.

Cite

Text

Zhuo et al. "TFM^2: Training-Free Mask Matching for Open-Vocabulary Semantic Segmentation." Winter Conference on Applications of Computer Vision, 2025.

Markdown

[Zhuo et al. "TFM^2: Training-Free Mask Matching for Open-Vocabulary Semantic Segmentation." Winter Conference on Applications of Computer Vision, 2025.](https://mlanthology.org/wacv/2025/zhuo2025wacv-tfm/)

BibTeX

@inproceedings{zhuo2025wacv-tfm,
  title     = {{TFM^2: Training-Free Mask Matching for Open-Vocabulary Semantic Segmentation}},
  author    = {Zhuo, Yaoxin and Bessinger, Zachary and Wang, Lichen and Khosravan, Naji and Li, Baoxin and Kang, Sing Bing},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2025},
  pages     = {4693-4703},
  url       = {https://mlanthology.org/wacv/2025/zhuo2025wacv-tfm/}
}