Image-to-Voxel Model Translation with Conditional Adversarial Networks
Abstract
We present a single-view voxel model prediction method that uses generative adversarial networks. Our method utilizes correspondences between 2D silhouettes and slices of a camera frustum to predict a voxel model of a scene with multiple object instances. We exploit pyramid shaped voxel and a generator network with skip connections between 2D and 3D feature maps. We collected two datasets VoxelCity and VoxelHome to train our framework with 36,416 images of 28 scenes with ground-truth 3D models, depth maps, and 6D object poses. We made the datasets publicly available ( http://www.zefirus.org/Z_GAN ). We evaluate our framework on 3D shape datasets to show that it delivers robust 3D scene reconstruction results that compete with and surpass state-of-the-art in a scene reconstruction with multiple non-rigid objects.
Cite
Text
Knyaz et al. "Image-to-Voxel Model Translation with Conditional Adversarial Networks." European Conference on Computer Vision Workshops, 2018. doi:10.1007/978-3-030-11009-3_37Markdown
[Knyaz et al. "Image-to-Voxel Model Translation with Conditional Adversarial Networks." European Conference on Computer Vision Workshops, 2018.](https://mlanthology.org/eccvw/2018/knyaz2018eccvw-imagetovoxel/) doi:10.1007/978-3-030-11009-3_37BibTeX
@inproceedings{knyaz2018eccvw-imagetovoxel,
title = {{Image-to-Voxel Model Translation with Conditional Adversarial Networks}},
author = {Knyaz, Vladimir A. and Kniaz, Vladimir V. and Remondino, Fabio},
booktitle = {European Conference on Computer Vision Workshops},
year = {2018},
pages = {601-618},
doi = {10.1007/978-3-030-11009-3_37},
url = {https://mlanthology.org/eccvw/2018/knyaz2018eccvw-imagetovoxel/}
}