Scene-Adaptive Person Search via Bilateral Modulations
Abstract
Visible-infrared person re-identification (VIReID) retrieves pedestrian images with the same identity across different modalities. Existing methods learn visual features solely from images, failing to align them into the modality-invariant semantic space. In this paper, we propose a novel framework, termed Richer Semantics, Better Alignment (RSBA), to align visual features with explicit and enriched semantics. Specifically, we first develop an Explicit Semantics-Guided Feature Alignment (ESFA) module, which supplements textual descriptions for cross-modality images and aligns image-text pairs within each modality, alleviating the distribution discrepancy of visual features. We then devise a Consistent Similarity-Guided Indirect Alignment (CSIA) module, which constrains the similarity between intra-modality image-text pairs to be consistent with that between inter-modality text-text pairs, indirectly aligning visual features with cross-modality semantics. Furthermore, we design a Cross-View Semantics Compensation (CVSC) module, which integrates multi-view texts and improves the image-text matching of one-to-one in ESFA and CSIA to one-to-many, further strengthening the alignment of visual features within the semantic space. Extensive experimental results on three public datasets demonstrate the effectiveness and superiority of our proposed RSBA.
Cite
Text
Jiang et al. "Scene-Adaptive Person Search via Bilateral Modulations." International Joint Conference on Artificial Intelligence, 2024. doi:10.24963/ijcai.2024/104Markdown
[Jiang et al. "Scene-Adaptive Person Search via Bilateral Modulations." International Joint Conference on Artificial Intelligence, 2024.](https://mlanthology.org/ijcai/2024/jiang2024ijcai-scene/) doi:10.24963/ijcai.2024/104BibTeX
@inproceedings{jiang2024ijcai-scene,
title = {{Scene-Adaptive Person Search via Bilateral Modulations}},
author = {Jiang, Yimin and Wang, Huibing and Peng, Jinjia and Fu, Xianping and Wang, Yang},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2024},
pages = {938-946},
doi = {10.24963/ijcai.2024/104},
url = {https://mlanthology.org/ijcai/2024/jiang2024ijcai-scene/}
}