Improved Conditional VRNNs for Video Prediction
Abstract
Predicting future frames for a video sequence is a challenging generative modeling task. Promising approaches include probabilistic latent variable models such as the Variational Auto-Encoder. While VAEs can handle uncertainty and model multiple possible future outcomes, they have a tendency to produce blurry predictions. In this work we argue that this is a sign of underfitting. To address this issue, we propose to increase the expressiveness of the latent distributions and to use higher capacity likelihood models. Our approach relies on a hierarchy of latent variables, which defines a family of flexible prior and posterior distributions in order to better model the probability of future sequences. We validate our proposal through a series of ablation experiments and compare our approach to current state-of-the-art latent variable models. Our method performs favorably under several metrics in three different datasets.
Cite
Text
Castrejon et al. "Improved Conditional VRNNs for Video Prediction." Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019. doi:10.1109/ICCV.2019.00770Markdown
[Castrejon et al. "Improved Conditional VRNNs for Video Prediction." Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.](https://mlanthology.org/iccv/2019/castrejon2019iccv-improved/) doi:10.1109/ICCV.2019.00770BibTeX
@inproceedings{castrejon2019iccv-improved,
title = {{Improved Conditional VRNNs for Video Prediction}},
author = {Castrejon, Lluis and Ballas, Nicolas and Courville, Aaron},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
year = {2019},
doi = {10.1109/ICCV.2019.00770},
url = {https://mlanthology.org/iccv/2019/castrejon2019iccv-improved/}
}