OGP-Net: Optical Guidance Meets Pixel-Level Contrastive Distillation for Robust Multi-Modal and Missing Modality Segmentation
Abstract
Enhancing the performance of semantic segmentation models with multi-spectral images (RGB-IR) is crucial, particularly for low-light and adverse environments. While multi-modal fusion techniques aim to learn cross-modality features for generating fused images or engage in knowledge distillation, they often treat multi-modal and missing modality scenarios as separate challenges, which is not an optimal approach. To address this, a novel multi-modal fusion approach called Optically-Guided Pixel-level contrastive learning Network (OGP-Net) is proposed, which uses Distillation with Multi-View Contrastive (DMC) and Distillation for Uni-modal Re- tention (DUR) to maintain the correlation between modality-shared and modality-specific features. DMC aligns the uni-modal features by projecting the semantic information across modalities into a unified latent space, ensuring that the feature maps retain multi-modal representations. Pixel-level multi-view contrastive learning is introduced to enable modality-invariant representation learning. To retain modality-specific information, DUR is proposed, which distills detailed textures from RGB images into the optical branch of OGP-Net. Additionally, the Gated Spectral Unit (GSU) is integrated into the framework to eliminate the need for manual tuning and avoid forced feature alignment. Comprehensive experiments show that OGP-Net outperforms state-of-the-art models in multi-modal and missing modality scenarios across three public benchmarking datasets. It achieves quicker convergence and learns efficiently from limited training samples.
Cite
Text
Sikdar et al. "OGP-Net: Optical Guidance Meets Pixel-Level Contrastive Distillation for Robust Multi-Modal and Missing Modality Segmentation." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I7.32743Markdown
[Sikdar et al. "OGP-Net: Optical Guidance Meets Pixel-Level Contrastive Distillation for Robust Multi-Modal and Missing Modality Segmentation." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/sikdar2025aaai-ogp/) doi:10.1609/AAAI.V39I7.32743BibTeX
@inproceedings{sikdar2025aaai-ogp,
title = {{OGP-Net: Optical Guidance Meets Pixel-Level Contrastive Distillation for Robust Multi-Modal and Missing Modality Segmentation}},
author = {Sikdar, Aniruddh and Teotia, Jayant and Sundaram, Suresh},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {6922-6930},
doi = {10.1609/AAAI.V39I7.32743},
url = {https://mlanthology.org/aaai/2025/sikdar2025aaai-ogp/}
}