Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond
Abstract
Multi-modality image fusion, particularly infrared and visible, plays a crucial role in integrating diverse modalities to enhance scene understanding. Although early research prioritized visual quality, preserving fine details and adapting to downstream tasks remains challenging. Recent approaches attempt task-specific design but rarely achieve "The Best of Both Worlds" due to inconsistent optimization goals. To address these issues, we propose a novel method that leverages the semantic knowledge from the Segment Anything Model (SAM) to grow the quality of fusion results and enable downstream task adaptability, namely SAGE. Specifically, we design a Semantic Persistent Attention (SPA) Module that efficiently maintains source information via the persistent repository while extracting high-level semantic priors from SAM. More importantly, to eliminate the impractical dependence on SAM during inference, we introduce a bi-level optimization-driven distillation mechanism with triplet losses, which allow the student network to effectively extract knowledge. Extensive experiments show that our method achieves a balance between high-quality visual results and downstream task adaptability while maintaining practical deployment efficiency. The code is available at https://github.com/RollingPlain/SAGE_IVIF.
Cite
Text
Wu et al. "Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.01666Markdown
[Wu et al. "Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/wu2025cvpr-every/) doi:10.1109/CVPR52734.2025.01666BibTeX
@inproceedings{wu2025cvpr-every,
title = {{Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond}},
author = {Wu, Guanyao and Liu, Haoyu and Fu, Hongming and Peng, Yichuan and Liu, Jinyuan and Fan, Xin and Liu, Risheng},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2025},
pages = {17882-17891},
doi = {10.1109/CVPR52734.2025.01666},
url = {https://mlanthology.org/cvpr/2025/wu2025cvpr-every/}
}