Learning Conditional Space-Time Prompt Distributions for Video Class-Incremental Learning

Zou, Xiaohan; Ma, Wenchao; Zhao, Shu

doi:10.1109/CVPR52734.2025.00458

Learning Conditional Space-Time Prompt Distributions for Video Class-Incremental Learning

Xiaohan Zou, Wenchao Ma, Shu Zhao

CVPR 2025 pp. 4862-4873

doi:10.1109/CVPR52734.2025.00458 /cvpr/2025/zou2025cvpr-learning/

Abstract

Recent advancements in prompt-based learning have significantly advanced image and video class-incremental learning. However, the prompts learned by these methods often fail to capture the diverse and informative characteristics of videos, and struggle to generalize effectively to future tasks and classes. To address these challenges, this paper proposes modeling the distribution of space-time prompts conditioned on the input video using a diffusion model. This generative approach allows the proposed model to naturally handle the diverse characteristics of videos, leading to more robust prompt learning and enhanced generalization capabilities. Additionally, we develop a simple yet effective mechanism to transfer the token relationship modeling capabilities of pre-trained image transformers to spatio-temporal modeling in videos. Our approach has been thoroughly evaluated across four established benchmarks, showing remarkable improvements over existing state-of-the-art methods in video class-incremental learning.

PDF CVPR Semantic Scholar

Cite

Text

Zou et al. "Learning Conditional Space-Time Prompt Distributions for Video Class-Incremental Learning." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.00458

Markdown

[Zou et al. "Learning Conditional Space-Time Prompt Distributions for Video Class-Incremental Learning." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/zou2025cvpr-learning/) doi:10.1109/CVPR52734.2025.00458

BibTeX

@inproceedings{zou2025cvpr-learning,
  title     = {{Learning Conditional Space-Time Prompt Distributions for Video Class-Incremental Learning}},
  author    = {Zou, Xiaohan and Ma, Wenchao and Zhao, Shu},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {4862-4873},
  doi       = {10.1109/CVPR52734.2025.00458},
  url       = {https://mlanthology.org/cvpr/2025/zou2025cvpr-learning/}
}