Reliable and Diverse Hierarchical Adapter for Zero-Shot Video Classification

Abstract

Adapting pre-trained vision-language models to downstream tasks has emerged as a novel paradigm for zero-shot learning. Existing test-time adaptation (TTA) methods such as TPT attempt to fine-tune visual or textual representations to accommodate downstream tasks but still require expensive optimization costs. To this end, Training-free Dynamic Adapter (TDA) maintains a cache containing visual features for each category in a parameter-free manner and measures sample confidence based on prediction entropy of test samples. Inspired by TDA, this work aims to develop the first training-free adapter for zero-shot video classification. Capturing the intrinsic temporal relationships within video data to construct and maintain the video cache is key to extending TDA to the video domain. In this work, we propose a reliable and diverse Hierarchical Adapter for zero-shot video classification, which consists of Frame-level Cache Refiner and Video-level Cache Updater. Before each video sample enters the corresponding cache, it needs to be refined at frame level based on prediction entropy and temporal probability difference. Due to the limited capacity of the cache, we update the cache during inference based on the principle of diversity. Experiments on four popular video classification benchmarks demonstrate the effectiveness of Hierarchical Adapter. The code is available at https://github.com/Gwxer/Hierarchical-Adapter.

Cite

Text

Ge et al. "Reliable and Diverse Hierarchical Adapter for Zero-Shot Video Classification." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/115

Markdown

[Ge et al. "Reliable and Diverse Hierarchical Adapter for Zero-Shot Video Classification." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/ge2025ijcai-reliable/) doi:10.24963/IJCAI.2025/115

BibTeX

@inproceedings{ge2025ijcai-reliable,
  title     = {{Reliable and Diverse Hierarchical Adapter for Zero-Shot Video Classification}},
  author    = {Ge, Wenxuan and Huang, Peng and Yan, Rui and Qu, Hongyu and Xie, Guosen and Shu, Xiangbo},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {1026-1034},
  doi       = {10.24963/IJCAI.2025/115},
  url       = {https://mlanthology.org/ijcai/2025/ge2025ijcai-reliable/}
}