Mixed Patch Visible-Infrared Modality Agnostic Object Detection

Abstract

In real-world scenarios using multiple modalities like visible (RGB) and infrared (IR) can greatly improve the performance of a predictive task such as object detection (OD). Multimodal learning is a common way to leverage these modalities where multiple modality-specific encoders and a fusion module are used to improve performance. In this paper we tackle a different way to employ RGB and IR modalities where only one modality or the other is observed by a single shared vision encoder. This realistic setting requires a lower memory footprint and is more suitable for applications such as autonomous driving and surveillance which commonly rely on RGB and IR data. However when learning a single encoder on multiple modalities one modality can dominate the other producing uneven recognition results. This work investigates how to efficiently leverage RGB and IR modalities to train a common transformer-based OD vision encoder while countering the effects of modality imbalance. For this we introduce a novel training technique to Mix Patches (MiPa) from the two modalities in conjunction with a patch-wise modality agnostic module for learning a common representation of both modalities. Our experiments show that MiPa can learn a representation to reach competitive results on traditional RGB/IR benchmarks while only requiring a single modality during inference. Our code is available at: https://github.com/heitorrapela/MiPa.

Cite

Text

Medeiros et al. "Mixed Patch Visible-Infrared Modality Agnostic Object Detection." Winter Conference on Applications of Computer Vision, 2025.

Markdown

[Medeiros et al. "Mixed Patch Visible-Infrared Modality Agnostic Object Detection." Winter Conference on Applications of Computer Vision, 2025.](https://mlanthology.org/wacv/2025/medeiros2025wacv-mixed/)

BibTeX

@inproceedings{medeiros2025wacv-mixed,
  title     = {{Mixed Patch Visible-Infrared Modality Agnostic Object Detection}},
  author    = {Medeiros, Heitor R. and Latortue, David and Granger, Eric and Pedersoli, Marco},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2025},
  pages     = {9005-9014},
  url       = {https://mlanthology.org/wacv/2025/medeiros2025wacv-mixed/}
}