SiMO: Single-Modality-Operable Multimodal Collaborative Perception

Wen, Jiageng; Zhao, Shengjie; Li, Bing; Huang, Jiafeng; Ye, Kenan; Deng, Hao

SiMO: Single-Modality-Operable Multimodal Collaborative Perception

Jiageng Wen, Shengjie Zhao, Bing Li, Jiafeng Huang, Kenan Ye, Hao Deng

ICLR 2026

/iclr/2026/wen2026iclr-simo/

Abstract

Collaborative perception integrates multi-agent perspectives to enhance the sensing range and overcome occlusion issues. While existing multimodal approaches leverage complementary sensors to improve performance, they are highly prone to failure—especially when a key sensor like LiDAR is unavailable. The root cause is that feature fusion leads to semantic mismatches between single-modality features and the downstream modules. This paper addresses this challenge for the first time in the field of collaborative perception, introducing **Si**ngle-**M**odality-**O**perable Multimodal Collaborative Perception (**SiMO**). By adopting the proposed **L**ength-**A**daptive **M**ulti-**M**od**a**l Fusion (**LAMMA**), SiMO can adaptively handle remaining modal features during modal failures while maintaining consistency of the semantic space. Additionally, leveraging the innovative "Pretrain-Align-Fuse-RD" training strategy, SiMO addresses the issue of modality competition—generally overlooked by existing methods—ensuring the independence of each individual modality branch. Experiments demonstrate that SiMO effectively aligns multimodal features while simultaneously preserving modality-specific features, enabling it to maintain optimal performance across all individual modalities. The implementation details can be found in [https://github.com/dempsey-wen/SiMO](https://github.com/dempsey-wen/SiMO).

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Wen et al. "SiMO: Single-Modality-Operable Multimodal Collaborative Perception." International Conference on Learning Representations, 2026.

Markdown

[Wen et al. "SiMO: Single-Modality-Operable Multimodal Collaborative Perception." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/wen2026iclr-simo/)

BibTeX

@inproceedings{wen2026iclr-simo,
  title     = {{SiMO: Single-Modality-Operable Multimodal Collaborative Perception}},
  author    = {Wen, Jiageng and Zhao, Shengjie and Li, Bing and Huang, Jiafeng and Ye, Kenan and Deng, Hao},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/wen2026iclr-simo/}
}