Towards Modality-Agnostic Person Re-Identification with Descriptive Query

Abstract

Person re-identification (ReID) with descriptive query (text or sketch) provides an important supplement for general image-image paradigms, which is usually studied in a single cross-modality matching manner, e.g., text-to-image or sketch-to-photo. However, without a camera-captured photo query, it is uncertain whether the text or sketch is available or not in practical scenarios. This motivates us to study a new and challenging modality-agnostic person re-identification problem. Towards this goal, we propose a unified person re-identification (UNIReID) architecture that can effectively adapt to cross-modality and multi-modality tasks. Specifically, UNIReID incorporates a simple dual-encoder with task-specific modality learning to mine and fuse visual and textual modality information. To deal with the imbalanced training problem of different tasks in UNIReID, we propose a task-aware dynamic training strategy in terms of task difficulty, adaptively adjusting the training focus. Besides, we construct three multi-modal ReID datasets by collecting the corresponding sketches from photos to support this challenging task. The experimental results on three multi-modal ReID datasets show that our UNIReID greatly improves the retrieval accuracy and generalization ability on different tasks and unseen scenarios.

Cite

Text

Chen et al. "Towards Modality-Agnostic Person Re-Identification with Descriptive Query." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.01452

Markdown

[Chen et al. "Towards Modality-Agnostic Person Re-Identification with Descriptive Query." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/chen2023cvpr-modalityagnostic/) doi:10.1109/CVPR52729.2023.01452

BibTeX

@inproceedings{chen2023cvpr-modalityagnostic,
  title     = {{Towards Modality-Agnostic Person Re-Identification with Descriptive Query}},
  author    = {Chen, Cuiqun and Ye, Mang and Jiang, Ding},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {15128-15137},
  doi       = {10.1109/CVPR52729.2023.01452},
  url       = {https://mlanthology.org/cvpr/2023/chen2023cvpr-modalityagnostic/}
}