Fusing Two Directions in Cross-Domain Adaption for Real Life Person Search by Language

Abstract

Person search by language is an important application in video surveillance. The existing huge visual-semantic discrepancy and the cross-domain difficulty of emerging pedestrian images with new identities while no language description for training in real life application make this problem non-trivial to be addressed. In this paper, we first propose a concise and effective framework for image-sentence alignment to deal with the visual-semantic discrepancy. Second, we innovatively fuse the two opposite directions, i.e., source to target and target to source, for cross-domain adaption. Extensive experiments have validated the significant superiority of the proposed method on both source domain and target domain, and we have obtained the state-of-the-art performance and won the 1st place in competition.

Cite

Text

Niu et al. "Fusing Two Directions in Cross-Domain Adaption for Real Life Person Search by Language." IEEE/CVF International Conference on Computer Vision Workshops, 2019. doi:10.1109/ICCVW.2019.00225

Markdown

[Niu et al. "Fusing Two Directions in Cross-Domain Adaption for Real Life Person Search by Language." IEEE/CVF International Conference on Computer Vision Workshops, 2019.](https://mlanthology.org/iccvw/2019/niu2019iccvw-fusing/) doi:10.1109/ICCVW.2019.00225

BibTeX

@inproceedings{niu2019iccvw-fusing,
  title     = {{Fusing Two Directions in Cross-Domain Adaption for Real Life Person Search by Language}},
  author    = {Niu, Kai and Huang, Yan and Wang, Liang},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2019},
  pages     = {1815-1818},
  doi       = {10.1109/ICCVW.2019.00225},
  url       = {https://mlanthology.org/iccvw/2019/niu2019iccvw-fusing/}
}