Entangle-Then-Disentangle: A Novel Approach for Enhancing Large Vision-Language Model
Abstract
Large-scale foundation models, such as the contrastive language-image pre-training model and the align language model, have shown promising performance on downstream tasks. However, despite their accomplishments, these large-scale foundation models still exhibit limitations in handling certain out-of-distribution downstream tasks, especially in the field of few-shot domain adaptation (FSDA). Advanced works propose prompt learning to overcome the distribution shift. However, the existing methods mainly concentrate on learning universal prompts applicable across available domains, neglecting to learn domain-specific prompts for the target domain already known in FSDA tasks. To fill this gap, we propose a novel learning approach, termed en tangle- t hen- di sentangle (EntDi), where each domain is assigned a distinct prompt to model the domain knowledge. The insight is that visual features from two domains, once entangled into a single representation, could be disentangled by leveraging domain-specific knowledge. Specifically, EntDi first entangles visual features from two images of disparate labels and domains. Subsequently, EntDi learns domain-specific prompts by predicting labels of these entangled features, where the labels are contingent on the domain-specific prompt used for prediction. Comprehensive experiments verify the efficacy of the proposed prompt learning approach.
Cite
Text
Yuan et al. "Entangle-Then-Disentangle: A Novel Approach for Enhancing Large Vision-Language Model." Machine Learning, 2025. doi:10.1007/S10994-025-06811-3Markdown
[Yuan et al. "Entangle-Then-Disentangle: A Novel Approach for Enhancing Large Vision-Language Model." Machine Learning, 2025.](https://mlanthology.org/mlj/2025/yuan2025mlj-entanglethendisentangle/) doi:10.1007/S10994-025-06811-3BibTeX
@article{yuan2025mlj-entanglethendisentangle,
title = {{Entangle-Then-Disentangle: A Novel Approach for Enhancing Large Vision-Language Model}},
author = {Yuan, Jiajun and Zheng, Haiting and Yu, Hang and Luo, Xiangfeng},
journal = {Machine Learning},
year = {2025},
pages = {171},
doi = {10.1007/S10994-025-06811-3},
volume = {114},
url = {https://mlanthology.org/mlj/2025/yuan2025mlj-entanglethendisentangle/}
}