M-SpecGene: Generalized Foundation Model for RGBT Multispectral Vision

Kailai Zhou, Fuqiang Yang, Shixian Wang, Bihan Wen, Chongde Zi, Linsen Chen, Qiu Shen, Xun Cao

ICCV 2025 pp. 7861-7872

/iccv/2025/zhou2025iccv-mspecgene/

Abstract

RGB-Thermal (RGBT) multispectral vision is essential for robust perception in complex environments. Most RGBT tasks follow a case-by-case research paradigm, relying on manually customized models to learn task-oriented representations. Nevertheless, this paradigm is inherently constrained by artificial inductive bias, modality bias, and data bottleneck. To address these limitations, we make the initial attempt to build a Generalized RGBT MultiSpectral foundation model (M-SpecGene), which aims to learn modality-invariant representations from large-scale broad data in a self-supervised manner. M-SpecGene provides new insights into multispectral fusion and integrates prior case-by-case studies into a unified paradigm. Considering the unique characteristic of information imbalance in RGBT data, we introduce the Cross-Modality Structural Sparsity (CMSS) metric to quantify the information density across two modalities. Then we develop the GMM-CMSS progressive masking strategy to facilitate a flexible, easy-to-hard, and object-centric pre-training process. Comprehensive experiments validate M-SpecGene's generalizability across eleven datasets for four RGBT downstream tasks.

PDF ICCV Semantic Scholar

Cite

Text

Zhou et al. "M-SpecGene: Generalized Foundation Model for RGBT Multispectral Vision." International Conference on Computer Vision, 2025.

Markdown

[Zhou et al. "M-SpecGene: Generalized Foundation Model for RGBT Multispectral Vision." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/zhou2025iccv-mspecgene/)

BibTeX

@inproceedings{zhou2025iccv-mspecgene,
  title     = {{M-SpecGene: Generalized Foundation Model for RGBT Multispectral Vision}},
  author    = {Zhou, Kailai and Yang, Fuqiang and Wang, Shixian and Wen, Bihan and Zi, Chongde and Chen, Linsen and Shen, Qiu and Cao, Xun},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {7861-7872},
  url       = {https://mlanthology.org/iccv/2025/zhou2025iccv-mspecgene/}
}