Pseudo Visible Feature Fine-Grained Fusion for Thermal Object Detection

Abstract

Thermal object detection is a critical task in various fields, such as surveillance and autonomous driving. Current state-of-the-art (SOTA) models always leverage a prior Thermal-To-Visible (T2V) translation model to obtain visible spectrum information, followed by a cross-modality aggregation module to fuse information from both modalities. However, this fusion approach does not fully exploit the complementary visible spectrum information beneficial for thermal detection. To address this issue, we propose a novel cross-modal fusion method called Pseudo Visible Feature Fine-Grained Fusion (PFGF). Specifically, a graph is constructed with nodes generated from multi-level thermal features and pseudo-visual latent features produced by the T2V model. Each level of features corresponds to a subgraph. An Inter-Mamba block is proposed to perform cross-modality fusion between nodes at the lowest level; while a Cascade Knowledge Integration (CKI) strategy is used to fuse low-level fused information to high-level subgraphs in a cascade manner. After several iterations of graph node updating, each subgraph outputs an aggregated feature to the detection head respectively. Unlike previous cross-modal fusion methods, our approach explicitly models high-level relationships between cross-modal data, effectively fusing different granularity information. Experimental results demonstrate that our method achieves SOTA detection performance. Code is available at https://github.com/liting1018/PFGF.

Cite

Text

Li et al. "Pseudo Visible Feature Fine-Grained Fusion for Thermal Object Detection." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.00629

Markdown

[Li et al. "Pseudo Visible Feature Fine-Grained Fusion for Thermal Object Detection." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/li2025cvpr-pseudo/) doi:10.1109/CVPR52734.2025.00629

BibTeX

@inproceedings{li2025cvpr-pseudo,
  title     = {{Pseudo Visible Feature Fine-Grained Fusion for Thermal Object Detection}},
  author    = {Li, Ting and Ye, Mao and Wu, Tianwen and Li, Nianxin and Li, Shuaifeng and Tang, Song and Ji, Luping},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {6710-6719},
  doi       = {10.1109/CVPR52734.2025.00629},
  url       = {https://mlanthology.org/cvpr/2025/li2025cvpr-pseudo/}
}