Gradformer: Graph Transformer with Exponential Decay
Abstract
Text-to-image person retrieval (TIPR) aims to find images of the same identity that match a given text description. Current TIPR methods mainly focus on mining the association between images and texts, ignoring their potential complementarity. Besides, existing matching losses treat all positive pairs from the same identity equally, leading to noisy correspondences. In this paper, we propose CoRL: a cross-modal Collaborative Representation Learning framework designed to improve TIPR by effectively leveraging the complementarity between modalities. The text typically contains identity details with less noise, which helps distinguish visually similar pedestrians. This inspires us to integrate it into the corresponding image to emphasize identity-related and modality-shared visual information. However, corresponding text for each image is not always available, especially during inference. Accordingly, we introduce a Virtual-text Embedding Synthesizer that generates high-quality virtual-text features for cross-modal collaboration, eliminating the need for actual texts. We then design a Cross-Modal Collaboration learning process, incorporating a Cross-modal Relation Consistency loss to promote interaction and fusion between image and virtual-text features for mutual enhancement. Additionally, an Identity-bounded Matching loss is proposed to handle different types of image-text pairs distinctly, leading to more accurate cross-modal correspondences. Extensive experiments on multiple benchmarks demonstrate the superiority of CoRL over existing TIPR methods.
Cite
Text
Liu et al. "Gradformer: Graph Transformer with Exponential Decay." International Joint Conference on Artificial Intelligence, 2024. doi:10.24963/ijcai.2024/240Markdown
[Liu et al. "Gradformer: Graph Transformer with Exponential Decay." International Joint Conference on Artificial Intelligence, 2024.](https://mlanthology.org/ijcai/2024/liu2024ijcai-gradformer/) doi:10.24963/ijcai.2024/240BibTeX
@inproceedings{liu2024ijcai-gradformer,
title = {{Gradformer: Graph Transformer with Exponential Decay}},
author = {Liu, Chuang and Yao, Zelin and Zhan, Yibing and Ma, Xueqi and Pan, Shirui and Hu, Wenbin},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2024},
pages = {2171-2179},
doi = {10.24963/ijcai.2024/240},
url = {https://mlanthology.org/ijcai/2024/liu2024ijcai-gradformer/}
}