ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning

Duan, Zhongjie; Zhang, Hong; Zhou, Wenmeng; Chen, Cen; Li, Yaliang; Zhang, Yu; Chen, Yingda

doi:10.24963/IJCAI.2025/1118

ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning

Zhongjie Duan, Hong Zhang, Wenmeng Zhou, Cen Chen, Yaliang Li, Yu Zhang, Yingda Chen

IJCAI 2025 pp. 10063-10071

doi:10.24963/IJCAI.2025/1118 /ijcai/2025/duan2025ijcai-exvideo/

Abstract

Recently, advancements in video synthesis have attracted significant attention. Video synthesis models have demonstrated the practical applicability of diffusion models in creating dynamic visual content. Despite these advancements, the extension of video lengths remains constrained by computational resources. Most existing video synthesis models are limited to generating short video clips. In this paper, we propose a novel post-tuning methodology for video synthesis models, called ExVideo. This approach is designed to enhance the capability of current video synthesis models, allowing them to produce content over extended temporal durations while incurring lower training expenditures. In particular, we design extension strategies across common temporal model architectures respectively, including 3D convolution, temporal attention, and positional embedding. To evaluate the efficacy of our proposed post-tuning approach, we trained ExSVD, an extended model based on Stable Video Diffusion model. Our approach enhances the model's capacity to generate up to 5x its original number of frames, requiring only 1.5k GPU hours of training on a dataset comprising 40k videos. Importantly, the substantial increase in video length doesn't compromise the model's innate generalization capabilities, and the model showcases its advantages in generating videos of diverse styles and resolutions. We have released the source code and the enhanced model publicly.

PDF IJCAI Semantic Scholar

Cite

Text

Duan et al. "ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/1118

Markdown

[Duan et al. "ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/duan2025ijcai-exvideo/) doi:10.24963/IJCAI.2025/1118

BibTeX

@inproceedings{duan2025ijcai-exvideo,
  title     = {{ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning}},
  author    = {Duan, Zhongjie and Zhang, Hong and Zhou, Wenmeng and Chen, Cen and Li, Yaliang and Zhang, Yu and Chen, Yingda},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {10063-10071},
  doi       = {10.24963/IJCAI.2025/1118},
  url       = {https://mlanthology.org/ijcai/2025/duan2025ijcai-exvideo/}
}