Duplicate Multi-Modal Entities Detection with Graph Contrastive Self-Training Network

Abstract

Duplicate multi-modal entities detection aims to find highly similar entities from massive entities with multi-modal information, which is a basic task in many applications and becoming more important and urgent with the development of Internet and e-commerce platforms. Traditional methods employ machine learning or deep learning on feature embedding extracted from multi-modal information, which ignores the correlation among entities and modals. Inspired by the popular Graph Neural Networks (GNNs), we can analyze the multi-relation graph of entities constructed from their multi-modal information with GNN. However, this solution still faces the extreme label sparsity challenge, particularly in industrial applications. In this work, we propose a novel graph contrastive self-training network model, named CT-GNN , for duplicate multi-modal entities detection with extreme label sparsity. With the multi-relation graph of entities constructed from multi-modal features of entities with KNN, we first learn the preliminary node embeddings with existing GNN, e.g., GCNs. To alleviate the problem of extremely sparse labels, we design a layer contrastive module to effectively exploit implicit label information, as well as a pseudo labels extension module to determine label boundary. In addition, graph structure learning is introduced to refine the structure of the multi-relation graph. A uniform optimization framework is designed to seamlessly integrate these three components. Sufficient experiments on real datasets, in comparison with SOTA baselines, well demonstrate the effectiveness of our proposed method.

Cite

Text

Gu et al. "Duplicate Multi-Modal Entities Detection with Graph Contrastive Self-Training Network." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023. doi:10.1007/978-3-031-43415-0_38

Markdown

[Gu et al. "Duplicate Multi-Modal Entities Detection with Graph Contrastive Self-Training Network." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023.](https://mlanthology.org/ecmlpkdd/2023/gu2023ecmlpkdd-duplicate/) doi:10.1007/978-3-031-43415-0_38

BibTeX

@inproceedings{gu2023ecmlpkdd-duplicate,
  title     = {{Duplicate Multi-Modal Entities Detection with Graph Contrastive Self-Training Network}},
  author    = {Gu, Shuyun and Wang, Xiao and Shi, Chuan},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2023},
  pages     = {651-665},
  doi       = {10.1007/978-3-031-43415-0_38},
  url       = {https://mlanthology.org/ecmlpkdd/2023/gu2023ecmlpkdd-duplicate/}
}