Improved Conditional VRNNs for Video Prediction

Abstract

Predicting future frames for a video sequence is a challenging generative modeling task. Promising approaches include probabilistic latent variable models such as the Variational Auto-Encoder. While VAEs can handle uncertainty and model multiple possible future outcomes, they have a tendency to produce blurry predictions. In this work we argue that this is a sign of underfitting. To address this issue, we propose to increase the expressiveness of the latent distributions and to use higher capacity likelihood models. Our approach relies on a hierarchy of latent variables, which defines a family of flexible prior and posterior distributions in order to better model the probability of future sequences. We validate our proposal through a series of ablation experiments and compare our approach to current state-of-the-art latent variable models. Our method performs favorably under several metrics in three different datasets.

Cite

Text

Castrejon et al. "Improved Conditional VRNNs for Video Prediction." Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019. doi:10.1109/ICCV.2019.00770

Markdown

[Castrejon et al. "Improved Conditional VRNNs for Video Prediction." Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.](https://mlanthology.org/iccv/2019/castrejon2019iccv-improved/) doi:10.1109/ICCV.2019.00770

BibTeX

@inproceedings{castrejon2019iccv-improved,
  title     = {{Improved Conditional VRNNs for Video Prediction}},
  author    = {Castrejon, Lluis and Ballas, Nicolas and Courville, Aaron},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year      = {2019},
  doi       = {10.1109/ICCV.2019.00770},
  url       = {https://mlanthology.org/iccv/2019/castrejon2019iccv-improved/}
}