Missing Modality Robustness in Semi-Supervised Multi-Modal Semantic Segmentation

Abstract

Using multiple spatial modalities has been proven helpful in improving semantic segmentation performance. However, there are several real-world challenges that have yet to be addressed: (a) improving label efficiency and (b) enhancing robustness in realistic scenarios where modalities are missing at the test time. To address these challenges, we first propose a simple yet efficient multi-modal fusion mechanism Linear Fusion, that performs better than the state-of-the-art multi-modal models even with limited supervision. Second, we propose M3L: Multi-modal Teacher for Masked Modality Learning, a semi-supervised framework that not only improves the multi-modal performance but also makes the model robust to the realistic missing modality scenario using unlabeled data. We create the first benchmark for semi-supervised multi-modal semantic segmentation and also report the robustness to missing modalities. Our proposal shows an absolute improvement of up to 5% on robust mIoU above the most competitive baselines. Our project page is at https://harshm121.github.io/projects/m3l.html

Cite

Text

Maheshwari et al. "Missing Modality Robustness in Semi-Supervised Multi-Modal Semantic Segmentation." Winter Conference on Applications of Computer Vision, 2024.

Markdown

[Maheshwari et al. "Missing Modality Robustness in Semi-Supervised Multi-Modal Semantic Segmentation." Winter Conference on Applications of Computer Vision, 2024.](https://mlanthology.org/wacv/2024/maheshwari2024wacv-missing/)

BibTeX

@inproceedings{maheshwari2024wacv-missing,
  title     = {{Missing Modality Robustness in Semi-Supervised Multi-Modal Semantic Segmentation}},
  author    = {Maheshwari, Harsh and Liu, Yen-Cheng and Kira, Zsolt},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2024},
  pages     = {1020-1030},
  url       = {https://mlanthology.org/wacv/2024/maheshwari2024wacv-missing/}
}