Richer Semantics, Better Alignment: Aligning Visual Features with Explicit and Enriched Semantics for Visible-Infrared Person Re-Identification

Abstract

Visible-infrared person re-identification (VIReID) retrieves pedestrian images with the same identity across different modalities. Existing methods learn visual features solely from images, failing to align them into the modality-invariant semantic space. In this paper, we propose a novel framework, termed Richer Semantics, Better Alignment (RSBA), to align visual features with explicit and enriched semantics. Specifically, we first develop an Explicit Semantics-Guided Feature Alignment (ESFA) module, which supplements textual descriptions for cross-modality images and aligns image-text pairs within each modality, alleviating the distribution discrepancy of visual features. We then devise a Consistent Similarity-Guided Indirect Alignment (CSIA) module, which constrains the similarity between intra-modality image-text pairs to be consistent with that between inter-modality text-text pairs, indirectly aligning visual features with cross-modality semantics. Furthermore, we design a Cross-View Semantics Compensation (CVSC) module, which integrates multi-view texts and improves the image-text matching of one-to-one in ESFA and CSIA to one-to-many, further strengthening the alignment of visual features within the semantic space. Extensive experimental results on three public datasets demonstrate the effectiveness and superiority of our proposed RSBA.

Cite

Text

Dong et al. "Richer Semantics, Better Alignment: Aligning Visual Features with Explicit and Enriched Semantics for Visible-Infrared Person Re-Identification." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/104

Markdown

[Dong et al. "Richer Semantics, Better Alignment: Aligning Visual Features with Explicit and Enriched Semantics for Visible-Infrared Person Re-Identification." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/dong2025ijcai-richer/) doi:10.24963/IJCAI.2025/104

BibTeX

@inproceedings{dong2025ijcai-richer,
  title     = {{Richer Semantics, Better Alignment: Aligning Visual Features with Explicit and Enriched Semantics for Visible-Infrared Person Re-Identification}},
  author    = {Dong, Neng and Yan, Shuanglin and Zhang, Liyan and Tang, Jinhui},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {927-935},
  doi       = {10.24963/IJCAI.2025/104},
  url       = {https://mlanthology.org/ijcai/2025/dong2025ijcai-richer/}
}