TransFusion: Multi-Modal Fusion Network for Semantic Segmentation
Abstract
The complementary properties of 2D color images and 3D point clouds can potentially improve semantic segmentation compared to using uni-modal data. Multi-modal data fusion is however challenging due to the heterogeneity, dimensionality of the data, the difficulty of aligning different modalities to the same reference frame, and the presence of modality-specific bias. In this regard, we propose a new model, TransFusion, for semantic segmentation that fuses images directly with point clouds without the need for lossy pre-processing of the point clouds. TransFusion outperforms the baseline FCN model that uses images with depth maps. Compared to the baseline, our method improved mIoU by 4% and 2% for the Vaihingen and Potsdam datasets. We demonstrate the capability of our proposed model to adequately learn the spatial and structural information resulting in better inference.
Cite
Text
Maiti et al. "TransFusion: Multi-Modal Fusion Network for Semantic Segmentation." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023. doi:10.1109/CVPRW59228.2023.00695Markdown
[Maiti et al. "TransFusion: Multi-Modal Fusion Network for Semantic Segmentation." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023.](https://mlanthology.org/cvprw/2023/maiti2023cvprw-transfusion/) doi:10.1109/CVPRW59228.2023.00695BibTeX
@inproceedings{maiti2023cvprw-transfusion,
title = {{TransFusion: Multi-Modal Fusion Network for Semantic Segmentation}},
author = {Maiti, Abhisek and Elberink, Sander Oude and Vosselman, George},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2023},
pages = {6537-6547},
doi = {10.1109/CVPRW59228.2023.00695},
url = {https://mlanthology.org/cvprw/2023/maiti2023cvprw-transfusion/}
}