DMPT: Decoupled Modality-Aware Prompt Tuning for Multi-Modal Object Re-Identification

Abstract

Current multi-modal object re-identification approaches based on large-scale pre-trained backbones (i.e. ViT) have displayed remarkable progress and achieved excellent performance. However these methods usually adopt the standard full fine-tuning paradigm which requires the optimization of considerable backbone parameters causing extensive computational and storage requirements. In this work we propose an efficient prompt-tuning framework tailored for multi-modal object re-identification dubbed DMPT which freezes the main backbone and only optimizes several newly added decoupled modality-aware parameters. Specifically we explicitly decouple the visual prompts into modality-specific prompts which leverage prior modality knowledge from a powerful text encoder and modality-independent semantic prompts which extract semantic information from multi-modal inputs such as visible near-infrared and thermal-infrared. Built upon the extracted features we further design a Prompt Inverse Bind (PromptIBind) strategy that employs bind prompts as a medium to connect the semantic prompt tokens of different modalities and facilitates the exchange of complementary multi-modal information boosting final re-identification results. Experimental results on multiple common benchmarks demonstrate that our DMPT can achieve competitive results to existing state-of-the-art methods while requiring only 6.5% fine-tuning of the backbone parameters.

Cite

Text

Lin et al. "DMPT: Decoupled Modality-Aware Prompt Tuning for Multi-Modal Object Re-Identification." Winter Conference on Applications of Computer Vision, 2025.

Markdown

[Lin et al. "DMPT: Decoupled Modality-Aware Prompt Tuning for Multi-Modal Object Re-Identification." Winter Conference on Applications of Computer Vision, 2025.](https://mlanthology.org/wacv/2025/lin2025wacv-dmpt/)

BibTeX

@inproceedings{lin2025wacv-dmpt,
  title     = {{DMPT: Decoupled Modality-Aware Prompt Tuning for Multi-Modal Object Re-Identification}},
  author    = {Lin, Minghui and Wang, Shu and Wang, Xiang and Tang, Jianhua and Fu, Longbin and Zuo, Zhengrong and Sang, Nong},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2025},
  pages     = {2103-2112},
  url       = {https://mlanthology.org/wacv/2025/lin2025wacv-dmpt/}
}