Efficient VideoMAE via Temporal Progressive Training

Li, Xianhang; Wang, Peng; Li, Xinyu; Wang, Heng; Zhu, Hongru; Xie, Cihang

Efficient VideoMAE via Temporal Progressive Training

Xianhang Li, Peng Wang, Xinyu Li, Heng Wang, Hongru Zhu, Cihang Xie

CVPRW 2025 pp. 2659-2668

/cvprw/2025/li2025cvprw-efficient/

Abstract

Masked autoencoders (MAE) have recently been adapted for video recognition, setting new performance benchmarks. Nonetheless, the computational overhead of training VideoMAE remains a prominent challenge, often demanding extensive GPU resources and days of training. To improve the efficiency of VideoMAE training, this paper presents Temporal Progressive Training (TPT), a simple yet effective method that strategically introduces longer video clips along the training process. Specifically, TPT decomposes the intricate task of long-clip reconstruction into a series of incremental sub-tasks, progressively transitioning from short to long video clips. Our extensive experiments demonstrate the efficacy and efficiency of TPT. For example, TPT reduces training costs by factors of 2xon Kinetics-400 and 3xon Something-Something V2, while maintaining the performance of VideoMAE. Furthermore, when given the same training budget, TPT consistently surpasses VideoMAE by 0.4-0.5% on Kinetics-400 and 0.2-0.6% on Something-Something V2.

PDF CVPRW Semantic Scholar

Cite

Text

Li et al. "Efficient VideoMAE via Temporal Progressive Training." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.

Markdown

[Li et al. "Efficient VideoMAE via Temporal Progressive Training." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.](https://mlanthology.org/cvprw/2025/li2025cvprw-efficient/)

BibTeX

@inproceedings{li2025cvprw-efficient,
  title     = {{Efficient VideoMAE via Temporal Progressive Training}},
  author    = {Li, Xianhang and Wang, Peng and Li, Xinyu and Wang, Heng and Zhu, Hongru and Xie, Cihang},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2025},
  pages     = {2659-2668},
  url       = {https://mlanthology.org/cvprw/2025/li2025cvprw-efficient/}
}