ExtDM: Distribution Extrapolation Diffusion Model for Video Prediction

Abstract

Video prediction is a challenging task due to its nature of uncertainty especially for forecasting a long period. To model the temporal dynamics advanced methods benefit from the recent success of diffusion models and repeatedly refine the predicted future frames with 3D spatiotemporal U-Net. However there exists a gap between the present and future and the repeated usage of U-Net brings a heavy computation burden. To address this we propose a diffusion-based video prediction method that predicts future frames by extrapolating the present distribution of features namely ExtDM. Specifically our method consists of three components: (i) a motion autoencoder conducts a bijection transformation between video frames and motion cues; (ii) a layered distribution adaptor module extrapolates the present features in the guidance of Gaussian distribution; (iii) a 3D U-Net architecture specialized for jointly fusing guidance and features among the temporal dimension by spatiotemporal-window attention. Extensive experiments on five popular benchmarks covering short- and long-term video prediction verify the effectiveness of ExtDM.

Cite

Text

Zhang et al. "ExtDM: Distribution Extrapolation Diffusion Model for Video Prediction." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.01827

Markdown

[Zhang et al. "ExtDM: Distribution Extrapolation Diffusion Model for Video Prediction." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/zhang2024cvpr-extdm/) doi:10.1109/CVPR52733.2024.01827

BibTeX

@inproceedings{zhang2024cvpr-extdm,
  title     = {{ExtDM: Distribution Extrapolation Diffusion Model for Video Prediction}},
  author    = {Zhang, Zhicheng and Hu, Junyao and Cheng, Wentao and Paudel, Danda and Yang, Jufeng},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {19310-19320},
  doi       = {10.1109/CVPR52733.2024.01827},
  url       = {https://mlanthology.org/cvpr/2024/zhang2024cvpr-extdm/}
}