Active Prompt Learning with Vision-Language Model Priors

Hoyoung Kim, Seokhee Jin, Changhwan Sung, Jaechang Kim, Jungseul Ok

TMLR 2025

/tmlr/2025/kim2025tmlr-active/

Abstract

Vision-language models (VLMs) have demonstrated remarkable zero-shot performance across various classification tasks. Nonetheless, their reliance on hand-crafted text prompts for each task hinders efficient adaptation to new tasks. While prompt learning offers a promising solution, most studies focus on maximizing the utilization of given few-shot labeled datasets, often overlooking the potential of careful data selection strategies, which enable higher accuracy with fewer labeled data. This motivates us to study a budget-efficient active prompt learning framework. Specifically, we introduce a class-guided clustering that leverages the pre-trained image and text encoders of VLMs, thereby enabling our cluster-balanced acquisition function from the initial round of active learning. Furthermore, considering the substantial class-wise variance in confidence exhibited by VLMs, we propose a budget-saving selective querying based on adaptive class-wise thresholds. Extensive experiments in active learning scenarios across seven datasets demonstrate that our method outperforms existing baselines.

PDF TMLR Code Semantic Scholar

Cite

Text

Kim et al. "Active Prompt Learning with Vision-Language Model Priors." Transactions on Machine Learning Research, 2025.

Markdown

[Kim et al. "Active Prompt Learning with Vision-Language Model Priors." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/kim2025tmlr-active/)

BibTeX

@article{kim2025tmlr-active,
  title     = {{Active Prompt Learning with Vision-Language Model Priors}},
  author    = {Kim, Hoyoung and Jin, Seokhee and Sung, Changhwan and Kim, Jaechang and Ok, Jungseul},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/kim2025tmlr-active/}
}