Video Pixel Networks
Abstract
We propose a probabilistic video model, the Video Pixel Network (VPN), that estimates the discrete joint distribution of the raw pixel values in a video. The model and the neural architecture reflect the time, space and color structure of video tensors and encode it as a four-dimensional dependency chain. The VPN approaches the best possible performance on the Moving MNIST benchmark, a leap over the previous state of the art, and the generated videos show only minor deviations from the ground truth. The VPN also produces detailed samples on the action-conditional Robotic Pushing benchmark and generalizes to the motion of novel objects.
Cite
Text
Kalchbrenner et al. "Video Pixel Networks." International Conference on Machine Learning, 2017.Markdown
[Kalchbrenner et al. "Video Pixel Networks." International Conference on Machine Learning, 2017.](https://mlanthology.org/icml/2017/kalchbrenner2017icml-video/)BibTeX
@inproceedings{kalchbrenner2017icml-video,
title = {{Video Pixel Networks}},
author = {Kalchbrenner, Nal and Oord, Aäron and Simonyan, Karen and Danihelka, Ivo and Vinyals, Oriol and Graves, Alex and Kavukcuoglu, Koray},
booktitle = {International Conference on Machine Learning},
year = {2017},
pages = {1771-1779},
volume = {70},
url = {https://mlanthology.org/icml/2017/kalchbrenner2017icml-video/}
}