Self-Supervised Multi-Modal Knowledge Graph Contrastive Hashing for Cross-Modal Search

Abstract

Deep cross-modal hashing technology provides an effective and efficient cross-modal unified representation learning solution for cross-modal search. However, the existing methods neglect the implicit fine-grained multimodal knowledge relations between these modalities such as when the image contains information that is not directly described in the text. To tackle this problem, we propose a novel self-supervised multi-grained multi-modal knowledge graph contrastive hashing method for cross-modal search (CMGCH). Firstly, in order to capture implicit fine-grained cross-modal semantic associations, a multi-modal knowledge graph is constructed, which represents the implicit multimodal knowledge relations between the image and text as inter-modal and intra-modal semantic associations. Secondly, a cross-modal graph contrastive attention network is proposed to reason on the multi-modal knowledge graph to sufficiently learn the implicit fine-grained inter-modal and intra-modal knowledge relations. Thirdly, a cross-modal multi-granularity contrastive embedding learning mechanism is proposed, which fuses the global coarse-grained and local fine-grained embeddings by multihead attention mechanism for inter-modal and intra-modal contrastive learning, so as to enhance the cross-modal unified representations with stronger discriminativeness and semantic consistency preserving power. With the joint training of intra-modal and inter-modal contrast, the invariant and modal-specific information of different modalities can be maintained in the final unified cross-modal unified hash space. Extensive experiments on several cross-modal benchmark datasets demonstrate that the proposed CMGCH outperforms the state-of the-art methods.

Cite

Text

Liang et al. "Self-Supervised Multi-Modal Knowledge Graph Contrastive Hashing for Cross-Modal Search." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I12.29280

Markdown

[Liang et al. "Self-Supervised Multi-Modal Knowledge Graph Contrastive Hashing for Cross-Modal Search." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/liang2024aaai-self/) doi:10.1609/AAAI.V38I12.29280

BibTeX

@inproceedings{liang2024aaai-self,
  title     = {{Self-Supervised Multi-Modal Knowledge Graph Contrastive Hashing for Cross-Modal Search}},
  author    = {Liang, Meiyu and Du, Junping and Liang, Zhengyang and Xing, Yongwang and Huang, Wei and Xue, Zhe},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {13744-13753},
  doi       = {10.1609/AAAI.V38I12.29280},
  url       = {https://mlanthology.org/aaai/2024/liang2024aaai-self/}
}