CrossSplit: Mitigating Label Noise Memorization Through Data Splitting

Abstract

We approach the problem of improving robustness of deep learning algorithms in the presence of label noise. Building upon existing label correction and co-teaching methods, we propose a novel training procedure to mitigate the memorization of noisy labels, called CrossSplit, which uses a pair of neural networks trained on two disjoint parts of the labeled dataset. CrossSplit combines two main ingredients: (i) Cross-split label correction. The idea is that, since the model trained on one part of the data cannot memorize example-label pairs from the other part, the training labels presented to each network can be smoothly adjusted by using the predictions of its peer network; (ii) Cross-split semi-supervised training. A network trained on one part of the data also uses the unlabeled inputs of the other part. Extensive experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet and mini-WebVision datasets demonstrate that our method can outperform the current state-of-the-art in a wide range of noise ratios. The project page is at https://rlawlgul.github.io/.

Cite

Text

Kim et al. "CrossSplit: Mitigating Label Noise Memorization Through Data Splitting." International Conference on Machine Learning, 2023.

Markdown

[Kim et al. "CrossSplit: Mitigating Label Noise Memorization Through Data Splitting." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/kim2023icml-crosssplit/)

BibTeX

@inproceedings{kim2023icml-crosssplit,
  title     = {{CrossSplit: Mitigating Label Noise Memorization Through Data Splitting}},
  author    = {Kim, Jihye and Baratin, Aristide and Zhang, Yan and Lacoste-Julien, Simon},
  booktitle = {International Conference on Machine Learning},
  year      = {2023},
  pages     = {16377-16392},
  volume    = {202},
  url       = {https://mlanthology.org/icml/2023/kim2023icml-crosssplit/}
}