Learning Cross-Modal Retrieval with Noisy Labels

Peng Hu, Xi Peng, Hongyuan Zhu, Liangli Zhen, Jie Lin

CVPR 2021 pp. 5403-5413

doi:10.1109/CVPR46437.2021.00536 /cvpr/2021/hu2021cvpr-learning-a/

Abstract

Recently, cross-modal retrieval is emerging with the help of deep multimodal learning. However, even for unimodal data, collecting large-scale well-annotated data is expensive and time-consuming, and not to mention the additional challenges from multiple modalities. Although crowd-sourcing annotation, e.g., Amazon's Mechanical Turk, can be utilized to mitigate the labeling cost, but leading to the unavoidable noise in labels for the non-expert annotating. To tackle the challenge, this paper presents a general Multimodal Robust Learning framework (MRL) for learning with multimodal noisy labels to mitigate noisy samples and correlate distinct modalities simultaneously. To be specific, we propose a Robust Clustering loss (RC) to make the deep networks focus on clean samples instead of noisy ones. Besides, a simple yet effective multimodal loss function, called Multimodal Contrastive loss (MC), is proposed to maximize the mutual information between different modalities, thus alleviating the interference of noisy samples and cross-modal discrepancy. Extensive experiments are conducted on four widely-used multimodal datasets to demonstrate the effectiveness of the proposed approach by comparing to 14 state-of-the-art methods.

PDF CVPR Semantic Scholar

Cite

Text

Hu et al. "Learning Cross-Modal Retrieval with Noisy Labels." Conference on Computer Vision and Pattern Recognition, 2021. doi:10.1109/CVPR46437.2021.00536

Markdown

[Hu et al. "Learning Cross-Modal Retrieval with Noisy Labels." Conference on Computer Vision and Pattern Recognition, 2021.](https://mlanthology.org/cvpr/2021/hu2021cvpr-learning-a/) doi:10.1109/CVPR46437.2021.00536

BibTeX

@inproceedings{hu2021cvpr-learning-a,
  title     = {{Learning Cross-Modal Retrieval with Noisy Labels}},
  author    = {Hu, Peng and Peng, Xi and Zhu, Hongyuan and Zhen, Liangli and Lin, Jie},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2021},
  pages     = {5403-5413},
  doi       = {10.1109/CVPR46437.2021.00536},
  url       = {https://mlanthology.org/cvpr/2021/hu2021cvpr-learning-a/}
}