Visual Object Networks: Image Generation with Disentangled 3D Representations

Zhu, Jun-Yan; Zhang, Zhoutong; Zhang, Chengkai; Wu, Jiajun; Torralba, Antonio; Tenenbaum, Josh; Freeman, Bill

Visual Object Networks: Image Generation with Disentangled 3D Representations

Jun-Yan Zhu, Zhoutong Zhang, Chengkai Zhang, Jiajun Wu, Antonio Torralba, Josh Tenenbaum, Bill Freeman

NeurIPS 2018 pp. 118-129

/neurips/2018/zhu2018neurips-visual/

Abstract

Recent progress in deep generative models has led to tremendous breakthroughs in image generation. While being able to synthesize photorealistic images, existing models lack an understanding of our underlying 3D world. Different from previous works built on 2D datasets and models, we present a new generative model, Visual Object Networks (VONs), synthesizing natural images of objects with a disentangled 3D representation. Inspired by classic graphics rendering pipelines, we unravel the image formation process into three conditionally independent factors---shape, viewpoint, and texture---and present an end-to-end adversarial learning framework that jointly models 3D shape and 2D texture. Our model first learns to synthesize 3D shapes that are indistinguishable from real shapes. It then renders the object's 2.5D sketches (i.e., silhouette and depth map) from its shape under a sampled viewpoint. Finally, it learns to add realistic textures to these 2.5D sketches to generate realistic images. The VON not only generates images that are more realistic than the state-of-the-art 2D image synthesis methods but also enables many 3D operations such as changing the viewpoint of a generated image, shape and texture editing, linear interpolation in texture and shape space, and transferring appearance across different objects and viewpoints.

PDF NeurIPS Semantic Scholar

Cite

Text

Zhu et al. "Visual Object Networks: Image Generation with Disentangled 3D Representations." Neural Information Processing Systems, 2018.

Markdown

[Zhu et al. "Visual Object Networks: Image Generation with Disentangled 3D Representations." Neural Information Processing Systems, 2018.](https://mlanthology.org/neurips/2018/zhu2018neurips-visual/)

BibTeX

@inproceedings{zhu2018neurips-visual,
  title     = {{Visual Object Networks: Image Generation with Disentangled 3D Representations}},
  author    = {Zhu, Jun-Yan and Zhang, Zhoutong and Zhang, Chengkai and Wu, Jiajun and Torralba, Antonio and Tenenbaum, Josh and Freeman, Bill},
  booktitle = {Neural Information Processing Systems},
  year      = {2018},
  pages     = {118-129},
  url       = {https://mlanthology.org/neurips/2018/zhu2018neurips-visual/}
}