TEST-V: TEst-Time Support-Set Tuning for Zero-Shot Video Classification

Yan, Rui; Wang, Jin; Qu, Hongyu; Du, Xiaoyu; Zhang, Dong; Tang, Jinhui; Tan, Tieniu

doi:10.24963/IJCAI.2025/239

TEST-V: TEst-Time Support-Set Tuning for Zero-Shot Video Classification

Rui Yan, Jin Wang, Hongyu Qu, Xiaoyu Du, Dong Zhang, Jinhui Tang, Tieniu Tan

IJCAI 2025 pp. 2143-2151

doi:10.24963/IJCAI.2025/239 /ijcai/2025/yan2025ijcai-test/

Abstract

Recently, adapting Vision Language Models (VLMs) to zero-shot visual classification by tuning class embedding with a few prompts (Test-time Prompt Tuning, TPT) or replacing class names with generated visual samples (support-set) has shown promising results. However, TPT cannot avoid the semantic gap between modalities while the support-set cannot be tuned. To this end, we draw on each other's strengths and propose a novel framework, namely TEst-time Support-set Tuning for zero-shot Video Classification (TEST-V). It first dilates the support-set with multiple prompts (Multi-prompting Support-set Dilation, MSD) and then erodes the support-set via learnable weights to mine key cues dynamically (Temporal-aware Support-set Erosion, TSE). Specifically, i) MSD expands the support samples for each class based on multiple prompts inquired from LLMs to enrich the diversity of the support-set. ii) TSE tunes the support-set with factorized learnable weights according to the temporal prediction consistency in a self-supervised manner to dig pivotal supporting cues for each class. TEST-V achieves state-of-the-art results across four benchmarks and shows good interpretability.

PDF IJCAI Semantic Scholar

Cite

Text

Yan et al. "TEST-V: TEst-Time Support-Set Tuning for Zero-Shot Video Classification." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/239

Markdown

[Yan et al. "TEST-V: TEst-Time Support-Set Tuning for Zero-Shot Video Classification." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/yan2025ijcai-test/) doi:10.24963/IJCAI.2025/239

BibTeX

@inproceedings{yan2025ijcai-test,
  title     = {{TEST-V: TEst-Time Support-Set Tuning for Zero-Shot Video Classification}},
  author    = {Yan, Rui and Wang, Jin and Qu, Hongyu and Du, Xiaoyu and Zhang, Dong and Tang, Jinhui and Tan, Tieniu},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {2143-2151},
  doi       = {10.24963/IJCAI.2025/239},
  url       = {https://mlanthology.org/ijcai/2025/yan2025ijcai-test/}
}