TMVNet : Using Transformers for Multi-View Voxel-Based 3D Reconstruction
Abstract
Previous research in multi-view 3D reconstruction have used different convolution neural network (CNN) architectures to obtain a 3D voxel representation. Even though CNN works well, they have limitations in exploiting the long-range dependencies in sequence transduction tasks such as multi-view 3D reconstruction. In this paper, we propose TMVNet–a two-layer transformer encoder that can better use long-range dependencies information. In contrast to using a 2D CNN decoder by the previous approaches, our model uses a 3D CNN encoder to capture the relations between the voxels in the 3D space. Also, our proposed 3D feature fusion network aggregates 3D position feature from CNN and long-range dependencies feature from transformer together. The proposed TMVNet is trained and tested on the ShapeNet dataset. Comparison against ten state-of-the-art multi-view 3D reconstruction methods and the reported quantitative and qualitative results show-case the superiority of our method.
Cite
Text
Peng et al. "TMVNet : Using Transformers for Multi-View Voxel-Based 3D Reconstruction." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022. doi:10.1109/CVPRW56347.2022.00036Markdown
[Peng et al. "TMVNet : Using Transformers for Multi-View Voxel-Based 3D Reconstruction." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022.](https://mlanthology.org/cvprw/2022/peng2022cvprw-tmvnet/) doi:10.1109/CVPRW56347.2022.00036BibTeX
@inproceedings{peng2022cvprw-tmvnet,
title = {{TMVNet : Using Transformers for Multi-View Voxel-Based 3D Reconstruction}},
author = {Peng, Kebin and Islam, Rifatul and Quarles, John and Desai, Kevin},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2022},
pages = {221-229},
doi = {10.1109/CVPRW56347.2022.00036},
url = {https://mlanthology.org/cvprw/2022/peng2022cvprw-tmvnet/}
}