Image-to-Voxel Model Translation with Conditional Adversarial Networks

Abstract

We present a single-view voxel model prediction method that uses generative adversarial networks. Our method utilizes correspondences between 2D silhouettes and slices of a camera frustum to predict a voxel model of a scene with multiple object instances. We exploit pyramid shaped voxel and a generator network with skip connections between 2D and 3D feature maps. We collected two datasets VoxelCity and VoxelHome to train our framework with 36,416 images of 28 scenes with ground-truth 3D models, depth maps, and 6D object poses. We made the datasets publicly available ( http://www.zefirus.org/Z_GAN ). We evaluate our framework on 3D shape datasets to show that it delivers robust 3D scene reconstruction results that compete with and surpass state-of-the-art in a scene reconstruction with multiple non-rigid objects.

Cite

Text

Knyaz et al. "Image-to-Voxel Model Translation with Conditional Adversarial Networks." European Conference on Computer Vision Workshops, 2018. doi:10.1007/978-3-030-11009-3_37

Markdown

[Knyaz et al. "Image-to-Voxel Model Translation with Conditional Adversarial Networks." European Conference on Computer Vision Workshops, 2018.](https://mlanthology.org/eccvw/2018/knyaz2018eccvw-imagetovoxel/) doi:10.1007/978-3-030-11009-3_37

BibTeX

@inproceedings{knyaz2018eccvw-imagetovoxel,
  title     = {{Image-to-Voxel Model Translation with Conditional Adversarial Networks}},
  author    = {Knyaz, Vladimir A. and Kniaz, Vladimir V. and Remondino, Fabio},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2018},
  pages     = {601-618},
  doi       = {10.1007/978-3-030-11009-3_37},
  url       = {https://mlanthology.org/eccvw/2018/knyaz2018eccvw-imagetovoxel/}
}