Multi-View 3D Reconstruction with Transformers

Dan Wang, Xinrui Cui, Xun Chen, Zhengxia Zou, Tianyang Shi, Septimiu Salcudean, Z. Jane Wang, Rabab Ward

ICCV 2021 pp. 5722-5731

doi:10.1109/ICCV48922.2021.00567 /iccv/2021/wang2021iccv-multiview/

Abstract

Deep CNN-based methods have so far achieved the state of the art results in multi-view 3D object reconstruction. Despite the considerable progress, the two core modules of these methods - view feature extraction and multi-view fusion, are usually investigated separately, and the relations among multiple input views are rarely explored. Inspired by the recent great success in Transformer models, we reformulate the multi-view 3D reconstruction as a sequence-to-sequence prediction problem and propose a framework named 3D Volume Transformer. Unlike previous CNN-based methods using a separate design, we unify the feature extraction and view fusion in a single Transformer network. A natural advantage of our design lies in the exploration of view-to-view relationships using self-attention among multiple unordered inputs. On ShapeNet - a large-scale 3D reconstruction benchmark, our method achieves a new state-of-the-art accuracy in multi-view reconstruction with fewer parameters (70% less) than CNN-based methods. Experimental results also suggest the strong scaling capability of our method. Our code will be made publicly available.

PDF ICCV Semantic Scholar

Cite

Text

Wang et al. "Multi-View 3D Reconstruction with Transformers." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.00567

Markdown

[Wang et al. "Multi-View 3D Reconstruction with Transformers." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/wang2021iccv-multiview/) doi:10.1109/ICCV48922.2021.00567

BibTeX

@inproceedings{wang2021iccv-multiview,
  title     = {{Multi-View 3D Reconstruction with Transformers}},
  author    = {Wang, Dan and Cui, Xinrui and Chen, Xun and Zou, Zhengxia and Shi, Tianyang and Salcudean, Septimiu and Wang, Z. Jane and Ward, Rabab},
  booktitle = {International Conference on Computer Vision},
  year      = {2021},
  pages     = {5722-5731},
  doi       = {10.1109/ICCV48922.2021.00567},
  url       = {https://mlanthology.org/iccv/2021/wang2021iccv-multiview/}
}