Robust Data Pruning Under Label Noise via Maximizing Re-Labeling Accuracy
Abstract
Data pruning, which aims to downsize a large training set into a small informative subset, is crucial for reducing the enormous computational costs of modern deep learning. Though large-scale data collections invariably contain annotation noise and numerous robust learning methods have been developed, data pruning for the noise-robust learning scenario has received little attention. With state-of-the-art Re-labeling methods that self-correct erroneous labels while training, it is challenging to identify which subset induces the most accurate re-labeling of erroneous labels in the entire training set. In this paper, we formalize the problem of data pruning with re-labeling. We first show that the likelihood of a training example being correctly re-labeled is proportional to the prediction confidence of its neighborhood in the subset. Therefore, we propose a novel data pruning algorithm, Prune4Rel, that finds a subset maximizing the total neighborhood confidence of all training examples, thereby maximizing the re-labeling accuracy and generalization performance. Extensive experiments on four real and one synthetic noisy datasets show that Prune4Rel outperforms the baselines with Re-labeling models by up to 9.1% as well as those with a standard model by up to 21.6%.
Cite
Text
Park et al. "Robust Data Pruning Under Label Noise via Maximizing Re-Labeling Accuracy." Neural Information Processing Systems, 2023.Markdown
[Park et al. "Robust Data Pruning Under Label Noise via Maximizing Re-Labeling Accuracy." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/park2023neurips-robust/)BibTeX
@inproceedings{park2023neurips-robust,
title = {{Robust Data Pruning Under Label Noise via Maximizing Re-Labeling Accuracy}},
author = {Park, Dongmin and Choi, Seola and Kim, Doyoung and Song, Hwanjun and Lee, Jae-Gil},
booktitle = {Neural Information Processing Systems},
year = {2023},
url = {https://mlanthology.org/neurips/2023/park2023neurips-robust/}
}