Image-to-Voxel Model Translation for 3D Scene Reconstruction and Segmentation

Vladimir V. Kniaz, Vladimir A. Knyaz, Fabio Remondino, Artem Bordodymov, Petr Moshkantsev

ECCV 2020

doi:10.1007/978-3-030-58571-6_7 /eccv/2020/kniaz2020eccv-imagetovoxel/

Abstract

Objects class, depth, and shape are instantly reconstructed by a human looking at a 2D image. While modern deep models solve each of these challenging tasks separately, they struggle to perform simultaneous scene 3D reconstruction and segmentation. We propose a single shot image-to-semantic voxel model translation framework. We train a generator adversarially against a discriminator that verifies the object's poses. Furthermore, trapezium-shaped voxels, volumetric residual blocks, and 2D-to-3D skip connections facilitate our model learning explicit reasoning about 3D scene structure. We collected a SemanticVoxels dataset with 116k images, ground-truth semantic voxel models, depth maps, and 6D object poses. Experiments on ShapeNet and our SemanticVoxels datasets demonstrate that our framework achieves and surpasses state-of-the-art in the reconstruction of scenes with multiple non-rigid objects of different classes. We made our model and dataset publicly available

PDF ECCV Semantic Scholar

Cite

Text

Kniaz et al. "Image-to-Voxel Model Translation for 3D Scene Reconstruction and Segmentation." Proceedings of the European Conference on Computer Vision (ECCV), 2020. doi:10.1007/978-3-030-58571-6_7

Markdown

[Kniaz et al. "Image-to-Voxel Model Translation for 3D Scene Reconstruction and Segmentation." Proceedings of the European Conference on Computer Vision (ECCV), 2020.](https://mlanthology.org/eccv/2020/kniaz2020eccv-imagetovoxel/) doi:10.1007/978-3-030-58571-6_7

BibTeX

@inproceedings{kniaz2020eccv-imagetovoxel,
  title     = {{Image-to-Voxel Model Translation for 3D Scene Reconstruction and Segmentation}},
  author    = {Kniaz, Vladimir V. and Knyaz, Vladimir A. and Remondino, Fabio and Bordodymov, Artem and Moshkantsev, Petr},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2020},
  doi       = {10.1007/978-3-030-58571-6_7},
  url       = {https://mlanthology.org/eccv/2020/kniaz2020eccv-imagetovoxel/}
}