Video Pixel Networks

Abstract

We propose a probabilistic video model, the Video Pixel Network (VPN), that estimates the discrete joint distribution of the raw pixel values in a video. The model and the neural architecture reflect the time, space and color structure of video tensors and encode it as a four-dimensional dependency chain. The VPN approaches the best possible performance on the Moving MNIST benchmark, a leap over the previous state of the art, and the generated videos show only minor deviations from the ground truth. The VPN also produces detailed samples on the action-conditional Robotic Pushing benchmark and generalizes to the motion of novel objects.

Cite

Text

Kalchbrenner et al. "Video Pixel Networks." International Conference on Machine Learning, 2017.

Markdown

[Kalchbrenner et al. "Video Pixel Networks." International Conference on Machine Learning, 2017.](https://mlanthology.org/icml/2017/kalchbrenner2017icml-video/)

BibTeX

@inproceedings{kalchbrenner2017icml-video,
  title     = {{Video Pixel Networks}},
  author    = {Kalchbrenner, Nal and Oord, Aäron and Simonyan, Karen and Danihelka, Ivo and Vinyals, Oriol and Graves, Alex and Kavukcuoglu, Koray},
  booktitle = {International Conference on Machine Learning},
  year      = {2017},
  pages     = {1771-1779},
  volume    = {70},
  url       = {https://mlanthology.org/icml/2017/kalchbrenner2017icml-video/}
}