Image-to-Voxel Model Translation for 3D Scene Reconstruction and Segmentation
Abstract
Objects class, depth, and shape are instantly reconstructed by a human looking at a 2D image. While modern deep models solve each of these challenging tasks separately, they struggle to perform simultaneous scene 3D reconstruction and segmentation. We propose a single shot image-to-semantic voxel model translation framework. We train a generator adversarially against a discriminator that verifies the object's poses. Furthermore, trapezium-shaped voxels, volumetric residual blocks, and 2D-to-3D skip connections facilitate our model learning explicit reasoning about 3D scene structure. We collected a SemanticVoxels dataset with 116k images, ground-truth semantic voxel models, depth maps, and 6D object poses. Experiments on ShapeNet and our SemanticVoxels datasets demonstrate that our framework achieves and surpasses state-of-the-art in the reconstruction of scenes with multiple non-rigid objects of different classes. We made our model and dataset publicly available
Cite
Text
Kniaz et al. "Image-to-Voxel Model Translation for 3D Scene Reconstruction and Segmentation." Proceedings of the European Conference on Computer Vision (ECCV), 2020. doi:10.1007/978-3-030-58571-6_7Markdown
[Kniaz et al. "Image-to-Voxel Model Translation for 3D Scene Reconstruction and Segmentation." Proceedings of the European Conference on Computer Vision (ECCV), 2020.](https://mlanthology.org/eccv/2020/kniaz2020eccv-imagetovoxel/) doi:10.1007/978-3-030-58571-6_7BibTeX
@inproceedings{kniaz2020eccv-imagetovoxel,
title = {{Image-to-Voxel Model Translation for 3D Scene Reconstruction and Segmentation}},
author = {Kniaz, Vladimir V. and Knyaz, Vladimir A. and Remondino, Fabio and Bordodymov, Artem and Moshkantsev, Petr},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2020},
doi = {10.1007/978-3-030-58571-6_7},
url = {https://mlanthology.org/eccv/2020/kniaz2020eccv-imagetovoxel/}
}