FIRM: Fusion-Injected Residual Memory Brings Token-Level Alignment to Unsupervised VI-ReID
Abstract
Unsupervised visible-infrared person re-identification (VI-ReID) presents unique challenges due to severe modality discrepancies, including heterogeneous appearance gaps, semantic granularity mismatches, and pseudo-label noise amplification intrinsic to label-free scenarios. We distill these challenges into two core problems: fine-grained semantic alignment, which necessitates explicit token-level cross-modal feature fusion, and memory fragmentation caused by noisy pseudo-label propagation. To address these issues, we propose Fusion-Injected Residual Memory (FIRM), a unified framework that integrates Vision–Semantic Prompt Fusion (VSPF), which injects multi-scale textual cues derived from CLIP and large language models into multiple layers of a vision backbone for token-wise semantic alignment, and Evolving Multi-view Cluster Memory (EMCM), which employs optimal transport–guided clustering and dynamic prototype maintenance to ensure long-term identity consistency. The framework is optimized end-to-end using an optimal transport–weighted InfoNCE loss, a multi-layer alignment regularizer, and geometric cluster regularization, all without reliance on manual annotations. Extensive experiments on benchmark VI-ReID datasets demonstrate that the proposed method substantially advances unsupervised cross-modal retrieval performance, achieving new state-of-the-art results. Ablation studies further verify the independent and synergistic effectiveness of both modules in overcoming the identified core challenges.
Cite
Text
Rong et al. "FIRM: Fusion-Injected Residual Memory Brings Token-Level Alignment to Unsupervised VI-ReID." Proceedings of the 17th Asian Conference on Machine Learning, 2025.Markdown
[Rong et al. "FIRM: Fusion-Injected Residual Memory Brings Token-Level Alignment to Unsupervised VI-ReID." Proceedings of the 17th Asian Conference on Machine Learning, 2025.](https://mlanthology.org/acml/2025/rong2025acml-firm/)BibTeX
@inproceedings{rong2025acml-firm,
title = {{FIRM: Fusion-Injected Residual Memory Brings Token-Level Alignment to Unsupervised VI-ReID}},
author = {Rong, Ze and Shen, Xiaofeng and Qin, Haoyang and Xu, Yue and Li, Hongjun and Ma, Lei},
booktitle = {Proceedings of the 17th Asian Conference on Machine Learning},
year = {2025},
pages = {1134-1149},
volume = {304},
url = {https://mlanthology.org/acml/2025/rong2025acml-firm/}
}