Cross-Modality Transformer for Visible-Infrared Person Re-Identification

Abstract

Visible-infrared person re-identification (VI-ReID) is a challenging task due to the large cross-modality discrepancies and intra-class variations. Existing works mainly focus on learning modality-shared representations by embedding different modalities into the same feature space. However, these methods usually damage the modality-specific information and identification information contained in the features. To alleviate the above issues, we propose a novel Cross-Modality Transformer (CMT) to jointly explore a modality-level alignment module and an instance-level module for VI-ReID. The proposed CMT enjoys several merits. First, the modality-level alignment module is designed to compensate for the missing modality-specific information via a Transformer encoder-decoder architecture. Second, we propose an instance-level alignment module to adaptively adjust the sample features, which is achieved by a query-adaptive feature modulation. To the best of our knowledge, this is the first work to exploit a cross-modality transformer to achieve the modality compensation for VI-ReID. Extensive experimental results on two standard benchmarks demonstrate that our CMT performs favorably against the state-of-the-art methods.

Cite

Text

Jiang et al. "Cross-Modality Transformer for Visible-Infrared Person Re-Identification." Proceedings of the European Conference on Computer Vision (ECCV), 2022. doi:10.1007/978-3-031-19781-9_28

Markdown

[Jiang et al. "Cross-Modality Transformer for Visible-Infrared Person Re-Identification." Proceedings of the European Conference on Computer Vision (ECCV), 2022.](https://mlanthology.org/eccv/2022/jiang2022eccv-crossmodality/) doi:10.1007/978-3-031-19781-9_28

BibTeX

@inproceedings{jiang2022eccv-crossmodality,
  title     = {{Cross-Modality Transformer for Visible-Infrared Person Re-Identification}},
  author    = {Jiang, Kongzhu and Zhang, Tianzhu and Liu, Xiang and Qian, Bingqiao and Zhang, Yongdong and Wu, Feng},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2022},
  doi       = {10.1007/978-3-031-19781-9_28},
  url       = {https://mlanthology.org/eccv/2022/jiang2022eccv-crossmodality/}
}