SemiDAViL: Semi-Supervised Domain Adaptation with Vision-Language Guidance for Semantic Segmentation
Abstract
Domain Adaptation (DA) and Semi-supervised Learning (SSL) converge in Semi-supervised Domain Adaptation (SSDA), where the objective is to transfer knowledge from a source domain to a target domain using a combination of limited labeled target samples and abundant unlabeled target data. Although intuitive, a simple amalgamation of DA and SSL is suboptimal in semantic segmentation due to two major reasons: (1) previous methods, while able to learn good segmentation boundaries, are prone to confuse classes with similar visual appearance due to limited supervision; and (2) skewed and imbalanced training data distribution preferring source representation learning whereas impeding from exploring limited information about tailed classes. Language guidance can serve as a pivotal semantic bridge, facilitating robust class discrimination and mitigating visual ambiguities by leveraging the rich semantic relationships encoded in pre-trained language models to enhance feature representations across domains. Therefore, we propose the first language-guided SSDA setting for semantic segmentation in this work. Specifically, we harness the semantic generalization capabilities inherent in vision-language models (VLMs) to establish a synergistic framework within the SSDA paradigm. To address the inherent class-imbalance challenges in long-tailed distributions, we introduce class-balanced segmentation loss formulations that effectively regularize the learning process. Through extensive experimentation across diverse domain adaptation scenarios, our approach demonstrates substantial performance improvements over contemporary state-of-the-art (SoTA) methodologies.
Cite
Text
Basak and Yin. "SemiDAViL: Semi-Supervised Domain Adaptation with Vision-Language Guidance for Semantic Segmentation." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.00917Markdown
[Basak and Yin. "SemiDAViL: Semi-Supervised Domain Adaptation with Vision-Language Guidance for Semantic Segmentation." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/basak2025cvpr-semidavil/) doi:10.1109/CVPR52734.2025.00917BibTeX
@inproceedings{basak2025cvpr-semidavil,
title = {{SemiDAViL: Semi-Supervised Domain Adaptation with Vision-Language Guidance for Semantic Segmentation}},
author = {Basak, Hritam and Yin, Zhaozheng},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2025},
pages = {9816-9828},
doi = {10.1109/CVPR52734.2025.00917},
url = {https://mlanthology.org/cvpr/2025/basak2025cvpr-semidavil/}
}