Revisiting Hierarchical Approach for Persistent Long-Term Video Prediction
Abstract
Learning to predict the long-term future of video frames is notoriously challenging due to the inherent ambiguities in a distant future and dramatic amplification of prediction error over time. Despite the recent advances in the literature, existing approaches are limited to moderately short-term prediction (less than a few seconds), while extrapolating it to a longer future quickly leads to destruction in structure and content. In this work, we revisit the hierarchical models in video prediction. Our method generates future frames by first estimating a sequence of dense semantic structures and subsequently translating the estimated structures to pixels by video-to-video translation model. Despite the simplicity, we show that modeling structures and their dynamics in categorical structure space with stochastic sequential estimator leads to surprisingly successful long-term prediction. We evaluate our method on two challenging video prediction scenarios, \emph{car driving} and \emph{human dancing}, and demonstrate that it can generate complicated scene structures and motions over a very long time horizon (\ie~thousands frames), setting a new standard of video prediction with orders of magnitude longer prediction time than existing approaches. Video results are available at https://1konny.github.io/HVP/.
Cite
Text
Lee et al. "Revisiting Hierarchical Approach for Persistent Long-Term Video Prediction." International Conference on Learning Representations, 2021.Markdown
[Lee et al. "Revisiting Hierarchical Approach for Persistent Long-Term Video Prediction." International Conference on Learning Representations, 2021.](https://mlanthology.org/iclr/2021/lee2021iclr-revisiting/)BibTeX
@inproceedings{lee2021iclr-revisiting,
title = {{Revisiting Hierarchical Approach for Persistent Long-Term Video Prediction}},
author = {Lee, Wonkwang and Jung, Whie and Zhang, Han and Chen, Ting and Koh, Jing Yu and Huang, Thomas and Yoon, Hyungsuk and Lee, Honglak and Hong, Seunghoon},
booktitle = {International Conference on Learning Representations},
year = {2021},
url = {https://mlanthology.org/iclr/2021/lee2021iclr-revisiting/}
}