Diverse Video Generation Using a Gaussian Process Trigger
Abstract
Generating future frames given a few context (or past) frames is a challenging task. It requires modeling the temporal coherence of videos as well as multi-modality in terms of diversity in the potential future states. Current variational approaches for video generation tend to marginalize over multi-modal future outcomes. Instead, we propose to explicitly model the multi-modality in the future outcomes and leverage it to sample diverse futures. Our approach, Diverse Video Generator, uses a GP to learn priors on future states given the past and maintains a probability distribution over possible futures given a particular sample. We leverage the changes in this distribution over time to control the sampling of diverse future states by estimating the end of on-going sequences. In particular, we use the variance of GP over the output function space to trigger a change in the action sequence. We achieve state-of-the-art results on diverse future frame generation in terms of reconstruction quality and diversity of the generated sequences.
Cite
Text
Shrivastava and Shrivastava. "Diverse Video Generation Using a Gaussian Process Trigger." International Conference on Learning Representations, 2021.Markdown
[Shrivastava and Shrivastava. "Diverse Video Generation Using a Gaussian Process Trigger." International Conference on Learning Representations, 2021.](https://mlanthology.org/iclr/2021/shrivastava2021iclr-diverse/)BibTeX
@inproceedings{shrivastava2021iclr-diverse,
title = {{Diverse Video Generation Using a Gaussian Process Trigger}},
author = {Shrivastava, Gaurav and Shrivastava, Abhinav},
booktitle = {International Conference on Learning Representations},
year = {2021},
url = {https://mlanthology.org/iclr/2021/shrivastava2021iclr-diverse/}
}