Generative Models of Visually Grounded Imagination

Abstract

It is easy for people to imagine what a man with pink hair looks like, even if they have never seen such a person before. We call the ability to create images of novel semantic concepts visually grounded imagination. In this paper, we show how we can modify variational auto-encoders to perform this task. Our method uses a novel training objective, and a novel product-of-experts inference network, which can handle partially specified (abstract) concepts in a principled and efficient way. We also propose a set of easy-to-compute evaluation metrics that capture our intuitive notions of what it means to have good visual imagination, namely correctness, coverage, and compositionality (the 3 C’s). Finally, we perform a detailed comparison of our method with two existing joint image-attribute VAE methods (the JMVAE method of Suzuki et al., 2017 and the BiVCCA method of Wang et al., 2016) by applying them to two datasets: the MNIST-with-attributes dataset (which we introduce here), and the CelebA dataset (Liu et al., 2015).

Cite

Text

Vedantam et al. "Generative Models of Visually Grounded Imagination." International Conference on Learning Representations, 2018.

Markdown

[Vedantam et al. "Generative Models of Visually Grounded Imagination." International Conference on Learning Representations, 2018.](https://mlanthology.org/iclr/2018/vedantam2018iclr-generative/)

BibTeX

@inproceedings{vedantam2018iclr-generative,
  title     = {{Generative Models of Visually Grounded Imagination}},
  author    = {Vedantam, Ramakrishna and Fischer, Ian and Huang, Jonathan and Murphy, Kevin},
  booktitle = {International Conference on Learning Representations},
  year      = {2018},
  url       = {https://mlanthology.org/iclr/2018/vedantam2018iclr-generative/}
}