PixelVAE: A Latent Variable Model for Natural Images

Abstract

Natural image modeling is a landmark challenge of unsupervised learning. Variational Autoencoders (VAEs) learn a useful latent representation and model global structure well but have difficulty capturing small details. PixelCNN models details very well, but lacks a latent code and is difficult to scale for capturing large structures. We present PixelVAE, a VAE model with an autoregressive decoder based on PixelCNN. Our model requires very few expensive autoregressive layers compared to PixelCNN and learns latent codes that are more compressed than a standard VAE while still capturing most non-trivial structure. Finally, we extend our model to a hierarchy of latent variables at different scales. Our model achieves state-of-the-art performance on binarized MNIST, competitive performance on 64x64 ImageNet, and high-quality samples on the LSUN bedrooms dataset.

Cite

Text

Gulrajani et al. "PixelVAE: A Latent Variable Model for Natural Images." International Conference on Learning Representations, 2017.

Markdown

[Gulrajani et al. "PixelVAE: A Latent Variable Model for Natural Images." International Conference on Learning Representations, 2017.](https://mlanthology.org/iclr/2017/gulrajani2017iclr-pixelvae/)

BibTeX

@inproceedings{gulrajani2017iclr-pixelvae,
  title     = {{PixelVAE: A Latent Variable Model for Natural Images}},
  author    = {Gulrajani, Ishaan and Kumar, Kundan and Ahmed, Faruk and Taïga, Adrien Ali and Visin, Francesco and Vázquez, David and Courville, Aaron C.},
  booktitle = {International Conference on Learning Representations},
  year      = {2017},
  url       = {https://mlanthology.org/iclr/2017/gulrajani2017iclr-pixelvae/}
}