ASMR: Learning Attribute-Based Person Search with Adaptive Semantic Margin Regularizer
Abstract
Attribute-based person search is the task of finding person images that are best matched with a set of text attributes given as query. The main challenge of this task is the large modality gap between attributes and images. To reduce the gap, we present a new loss for learning cross-modal embeddings in the context of attribute-based person search. We regard a set of attributes as a category of people sharing the same traits. In a joint embedding space of the two modalities, our loss pulls images close to their person categories for modality alignment. More importantly, it pushes apart a pair of person categories by a margin determined adaptively by their semantic distance, where the distance metric is learned end-to-end so that the loss considers importance of each attribute when relating person categories. Our loss guided by the adaptive semantic margin leads to more discriminative and semantically well-arranged distributions of person images. As a consequence, it enables a simple embedding model to achieve state-of-the-art records on public benchmarks without bells and whistles.
Cite
Text
Jeong et al. "ASMR: Learning Attribute-Based Person Search with Adaptive Semantic Margin Regularizer." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.01180Markdown
[Jeong et al. "ASMR: Learning Attribute-Based Person Search with Adaptive Semantic Margin Regularizer." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/jeong2021iccv-asmr/) doi:10.1109/ICCV48922.2021.01180BibTeX
@inproceedings{jeong2021iccv-asmr,
title = {{ASMR: Learning Attribute-Based Person Search with Adaptive Semantic Margin Regularizer}},
author = {Jeong, Boseung and Park, Jicheol and Kwak, Suha},
booktitle = {International Conference on Computer Vision},
year = {2021},
pages = {12016-12025},
doi = {10.1109/ICCV48922.2021.01180},
url = {https://mlanthology.org/iccv/2021/jeong2021iccv-asmr/}
}