Temporally Consistent Semantic Segmentation Using Spatially Aware Multi-View Semantic Fusion for Indoor RGB-D Videos

Abstract

The task of performing image semantic segmentation faces challenges in achieving consistent and robust results across a sequence of video frames. This problem becomes more prominent for indoor scenes where small camera movement can lead to drastic appearance changes, occlusions, and loss of global context information.To overcome these challenges, this paper proposes a novel approach that combines multi-view semantic fusion with spatial reasoning to produce view-invariant semantic features for temporally consistent semantic segmentation for indoor RGB-D videos.The experiments are conducted on the ScanNet dataset, showing that the proposed spatially aware multi-view fusion mechanism significantly improves the state-of-the-art image semantic segmentation methods Mask2Former and ViT-Adapter. In particular, the proposed pipeline offers improvements of 5%, 9.9%, and 14.4% in 2D mIoU, cross-view consistency, and temporal consistency, respectively, when compared to Mask2Former. Similarly, when compared to ViT-Adapter, the proposed mechanism offers enhancements of 4.8%, 8.9%, and 10.9% in the same metrics.

Cite

Text

Sun et al. "Temporally Consistent Semantic Segmentation Using Spatially Aware Multi-View Semantic Fusion for Indoor RGB-D Videos." IEEE/CVF International Conference on Computer Vision Workshops, 2023. doi:10.1109/ICCVW60793.2023.00459

Markdown

[Sun et al. "Temporally Consistent Semantic Segmentation Using Spatially Aware Multi-View Semantic Fusion for Indoor RGB-D Videos." IEEE/CVF International Conference on Computer Vision Workshops, 2023.](https://mlanthology.org/iccvw/2023/sun2023iccvw-temporally/) doi:10.1109/ICCVW60793.2023.00459

BibTeX

@inproceedings{sun2023iccvw-temporally,
  title     = {{Temporally Consistent Semantic Segmentation Using Spatially Aware Multi-View Semantic Fusion for Indoor RGB-D Videos}},
  author    = {Sun, Fengyuan and Karaoglu, Sezer and Gevers, Theo},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2023},
  pages     = {4250-4259},
  doi       = {10.1109/ICCVW60793.2023.00459},
  url       = {https://mlanthology.org/iccvw/2023/sun2023iccvw-temporally/}
}