Semi-Supervised Fine-Tuning of Vision Foundation Models with Content-Style Decomposition
Abstract
In this paper, we present a semi-supervised fine-tuning approach designed to improve the performance of pre-trained foundation models on downstream tasks with limited labeled data. By leveraging content-style decomposition within an information-theoretic framework, our method enhances the latent representations of pre-trained vision foundation models, aligning them more effectively with specific task objectives and addressing the problem of distribution shift. We evaluate our approach on multiple datasets, including MNIST, its augmented variations (with yellow and white stripes), CIFAR-10, SVHN, and GalaxyMNIST. The experiments show improvements over supervised finetuning baseline of pre-trained models, particularly in low-labeled data regimes, across both frozen and trainable backbones for the majority of the tested datasets.
Cite
Text
Drozdova et al. "Semi-Supervised Fine-Tuning of Vision Foundation Models with Content-Style Decomposition." NeurIPS 2024 Workshops: FITML, 2024.Markdown
[Drozdova et al. "Semi-Supervised Fine-Tuning of Vision Foundation Models with Content-Style Decomposition." NeurIPS 2024 Workshops: FITML, 2024.](https://mlanthology.org/neuripsw/2024/drozdova2024neuripsw-semisupervised/)BibTeX
@inproceedings{drozdova2024neuripsw-semisupervised,
title = {{Semi-Supervised Fine-Tuning of Vision Foundation Models with Content-Style Decomposition}},
author = {Drozdova, Mariia and Kinakh, Vitaliy and Belousov, Yury and Lastufka, Erica and Voloshynovskiy, Slava},
booktitle = {NeurIPS 2024 Workshops: FITML},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/drozdova2024neuripsw-semisupervised/}
}