Probabilistic Video Prediction Using Conditional Score Diffusion

Fiquet, Pierre-Etienne H; Simoncelli, Eero P

Probabilistic Video Prediction Using Conditional Score Diffusion

Pierre-Etienne H Fiquet, Eero P Simoncelli

ICLRW 2025

/iclrw/2025/fiquet2025iclrw-probabilistic/

Abstract

Temporal prediction of natural videos is inherently uncertain but explicit probabilistic modeling and inference suffer from statistical and computational challenges in high-dimensions. We describe an implicit regression-based framework for estimating and sampling the conditional density of the next frame in a video given previous observed frames. We show that sequence-to-image deep networks trained on a simple resilience-to-noise objective function extract adaptive representations for temporal prediction. Synthetic experiments demonstrate that this score-based framework can handle occlusion boundaries: unlike classical methods that average over bifurcating temporal trajectories, it chooses among likely trajectories, selecting more probable options with higher frequency. Furthermore, analysis of networks trained on natural videos reveals that the learned representations exploits spatio-temporal continuity and automatically weights predictive evidence by its reliability.

PDF ICLRW OpenReview Semantic Scholar

Cite

Text

Fiquet and Simoncelli. "Probabilistic Video Prediction Using Conditional Score Diffusion." ICLR 2025 Workshops: FPI, 2025.

Markdown

[Fiquet and Simoncelli. "Probabilistic Video Prediction Using Conditional Score Diffusion." ICLR 2025 Workshops: FPI, 2025.](https://mlanthology.org/iclrw/2025/fiquet2025iclrw-probabilistic/)

BibTeX

@inproceedings{fiquet2025iclrw-probabilistic,
  title     = {{Probabilistic Video Prediction Using Conditional Score Diffusion}},
  author    = {Fiquet, Pierre-Etienne H and Simoncelli, Eero P},
  booktitle = {ICLR 2025 Workshops: FPI},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/fiquet2025iclrw-probabilistic/}
}