Gradformer: Graph Transformer with Exponential Decay

Liu, Chuang; Yao, Zelin; Zhan, Yibing; Ma, Xueqi; Pan, Shirui; Hu, Wenbin

doi:10.24963/ijcai.2024/240

Gradformer: Graph Transformer with Exponential Decay

Chuang Liu, Zelin Yao, Yibing Zhan, Xueqi Ma, Shirui Pan, Wenbin Hu

IJCAI 2024 pp. 2171-2179

doi:10.24963/ijcai.2024/240 /ijcai/2024/liu2024ijcai-gradformer/

Abstract

Text-to-image person retrieval (TIPR) aims to find images of the same identity that match a given text description. Current TIPR methods mainly focus on mining the association between images and texts, ignoring their potential complementarity. Besides, existing matching losses treat all positive pairs from the same identity equally, leading to noisy correspondences. In this paper, we propose CoRL: a cross-modal Collaborative Representation Learning framework designed to improve TIPR by effectively leveraging the complementarity between modalities. The text typically contains identity details with less noise, which helps distinguish visually similar pedestrians. This inspires us to integrate it into the corresponding image to emphasize identity-related and modality-shared visual information. However, corresponding text for each image is not always available, especially during inference. Accordingly, we introduce a Virtual-text Embedding Synthesizer that generates high-quality virtual-text features for cross-modal collaboration, eliminating the need for actual texts. We then design a Cross-Modal Collaboration learning process, incorporating a Cross-modal Relation Consistency loss to promote interaction and fusion between image and virtual-text features for mutual enhancement. Additionally, an Identity-bounded Matching loss is proposed to handle different types of image-text pairs distinctly, leading to more accurate cross-modal correspondences. Extensive experiments on multiple benchmarks demonstrate the superiority of CoRL over existing TIPR methods.

PDF IJCAI Semantic Scholar

Cite

Text

Liu et al. "Gradformer: Graph Transformer with Exponential Decay." International Joint Conference on Artificial Intelligence, 2024. doi:10.24963/ijcai.2024/240

Markdown

[Liu et al. "Gradformer: Graph Transformer with Exponential Decay." International Joint Conference on Artificial Intelligence, 2024.](https://mlanthology.org/ijcai/2024/liu2024ijcai-gradformer/) doi:10.24963/ijcai.2024/240

BibTeX

@inproceedings{liu2024ijcai-gradformer,
  title     = {{Gradformer: Graph Transformer with Exponential Decay}},
  author    = {Liu, Chuang and Yao, Zelin and Zhan, Yibing and Ma, Xueqi and Pan, Shirui and Hu, Wenbin},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {2171-2179},
  doi       = {10.24963/ijcai.2024/240},
  url       = {https://mlanthology.org/ijcai/2024/liu2024ijcai-gradformer/}
}