Cross-Modal Person Search: A Coarse-to-Fine Framework Using Bi-Directional Text-Image Matching

Yu, Xiaojing; Chen, Tianlong; Yang, Yang; Mugo, Michael; Wang, Zhangyang

doi:10.1109/ICCVW.2019.00223

Cross-Modal Person Search: A Coarse-to-Fine Framework Using Bi-Directional Text-Image Matching

Xiaojing Yu, Tianlong Chen, Yang Yang, Michael Mugo, Zhangyang Wang

ICCVW 2019 pp. 1799-1804

doi:10.1109/ICCVW.2019.00223 /iccvw/2019/yu2019iccvw-crossmodal/

Abstract

Searching person images from a gallery based on natural language descriptions remains to be a challenging and under-explored cross-modal retrieval problem. To improve the accuracy off an image-based retrieval task, e.g., person re-identification (Person Re-Id), re-ranking is known to be an effective post-processing tool. In this paper, we extend re-ranking from uni-modal retrieval to cross-modal retrieval for the first time, and develop a bi-directional coarse-to-fine framework (BCF) for cross-modal person search. Built on a recent state-of-the-art Person Re-Id model, BCF exploits first text-to-image and then image-to-text relevance, in a two-stage refinement fashion. BCF ranks competitively against a strong baseline on the newly-introduced WIDER Person Search dataset, boosting validation set performance by 9.01%(top-1)/3.87%(mAP) for val1 and 6.60%(top-1)/3.49%(mAP) for val2, respectively. With a high score, our solution ranks competitively in the ICCV 2019 WIDER Person Search by Language Challenge.

ICCVW Semantic Scholar

Cite

Text

Yu et al. "Cross-Modal Person Search: A Coarse-to-Fine Framework Using Bi-Directional Text-Image Matching." IEEE/CVF International Conference on Computer Vision Workshops, 2019. doi:10.1109/ICCVW.2019.00223

Markdown

[Yu et al. "Cross-Modal Person Search: A Coarse-to-Fine Framework Using Bi-Directional Text-Image Matching." IEEE/CVF International Conference on Computer Vision Workshops, 2019.](https://mlanthology.org/iccvw/2019/yu2019iccvw-crossmodal/) doi:10.1109/ICCVW.2019.00223

BibTeX

@inproceedings{yu2019iccvw-crossmodal,
  title     = {{Cross-Modal Person Search: A Coarse-to-Fine Framework Using Bi-Directional Text-Image Matching}},
  author    = {Yu, Xiaojing and Chen, Tianlong and Yang, Yang and Mugo, Michael and Wang, Zhangyang},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2019},
  pages     = {1799-1804},
  doi       = {10.1109/ICCVW.2019.00223},
  url       = {https://mlanthology.org/iccvw/2019/yu2019iccvw-crossmodal/}
}