Noisy Correspondence Learning with Meta Similarity Correction

Abstract

Despite the success of multimodal learning in cross-modal retrieval task, the remarkable progress relies on the correct correspondence among multimedia data. However, collecting such ideal data is expensive and time-consuming. In practice, most widely used datasets are harvested from the Internet and inevitably contain mismatched pairs. Training on such noisy correspondence datasets causes performance degradation because the cross-modal retrieval methods can wrongly enforce the mismatched data to be similar. To tackle this problem, we propose a Meta Similarity Correction Network (MSCN) to provide reliable similarity scores. We view a binary classification task as the meta-process that encourages the MSCN to learn discrimination from positive and negative meta-data. To further alleviate the influence of noise, we design an effective data purification strategy using meta-data as prior knowledge to remove the noisy samples. Extensive experiments are conducted to demonstrate the strengths of our method in both synthetic and real-world noises, including Flickr30K, MS-COCO, and Conceptual Captions.

Cite

Text

Han et al. "Noisy Correspondence Learning with Meta Similarity Correction." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.00726

Markdown

[Han et al. "Noisy Correspondence Learning with Meta Similarity Correction." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/han2023cvpr-noisy/) doi:10.1109/CVPR52729.2023.00726

BibTeX

@inproceedings{han2023cvpr-noisy,
  title     = {{Noisy Correspondence Learning with Meta Similarity Correction}},
  author    = {Han, Haochen and Miao, Kaiyao and Zheng, Qinghua and Luo, Minnan},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {7517-7526},
  doi       = {10.1109/CVPR52729.2023.00726},
  url       = {https://mlanthology.org/cvpr/2023/han2023cvpr-noisy/}
}