Space-Time Prompting for Video Class-Incremental Learning

Abstract

Recently, prompt-based learning has made impressive progress on image class-incremental learning, but it still lacks sufficient exploration in the video domain. In this paper, we will fill this gap by learning multiple prompts based on a powerful image-language pre-trained model, i.e., CLIP, making it fit for video class-incremental learning (VCIL). For this purpose, we present a space-time prompting approach (ST-Prompt) which contains two kinds of prompts, i.e., task-specific prompts and task-agnostic prompts. The task-specific prompts are to address the catastrophic forgetting problem by learning multi-grained prompts, i.e., spatial prompts, temporal prompts and comprehensive prompts, for accurate task identification. The task-agnostic prompts maintain a globally-shared prompt pool, which can empower the pre-trained image models with temporal perception abilities by exchanging contexts between frames. By this means, ST-Prompt can transfer the plentiful knowledge in the image-language pre-trained models to the VCIL task with only a tiny set of prompts to be optimized. To evaluate ST-Prompt, we conduct extensive experiments on three standard benchmarks. The results show that ST-Prompt can significantly surpass the state-of-the-art VCIL methods, especially it gains 9.06% on HMDB51 dataset under the 1*25 stage setting.

Cite

Text

Pei et al. "Space-Time Prompting for Video Class-Incremental Learning." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.01096

Markdown

[Pei et al. "Space-Time Prompting for Video Class-Incremental Learning." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/pei2023iccv-spacetime/) doi:10.1109/ICCV51070.2023.01096

BibTeX

@inproceedings{pei2023iccv-spacetime,
  title     = {{Space-Time Prompting for Video Class-Incremental Learning}},
  author    = {Pei, Yixuan and Qing, Zhiwu and Zhang, Shiwei and Wang, Xiang and Zhang, Yingya and Zhao, Deli and Qian, Xueming},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {11932-11942},
  doi       = {10.1109/ICCV51070.2023.01096},
  url       = {https://mlanthology.org/iccv/2023/pei2023iccv-spacetime/}
}