Coreset-Driven Re-Labeling: Tackling Noisy Annotations with Noise-Free Gradients
Abstract
Large-scale datasets invariably contain annotation noise. Re-labeling methods have been developed to handle annotation noise in large-scale datasets. Though various methodologies to alleviate annotation noise have been developed, these are particularly time-consuming and computationally intensive. The requirement of high computational power and longer time duration can be drastically reduced by selecting a representative coreset. In this work, we adapt a noise-free gradient-based coreset selection method towards re-labeling applications for noisy datasets with erroneous labels. We introduce ‘confidence score’ to the coreset selection method to cater for the presence of noisy labels. Through extensive evaluation over CIFAR-100N, Web Vision, and ImageNet-1K Datasets, we demonstrate that our method outperforms the SOTA coreset selection for re-labeling methods (DivideMix and SOP+). We have provided the codebase at URL.
Cite
Text
Mohanty and Mopuri. "Coreset-Driven Re-Labeling: Tackling Noisy Annotations with Noise-Free Gradients." Transactions on Machine Learning Research, 2025.Markdown
[Mohanty and Mopuri. "Coreset-Driven Re-Labeling: Tackling Noisy Annotations with Noise-Free Gradients." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/mohanty2025tmlr-coresetdriven/)BibTeX
@article{mohanty2025tmlr-coresetdriven,
title = {{Coreset-Driven Re-Labeling: Tackling Noisy Annotations with Noise-Free Gradients}},
author = {Mohanty, Saumyaranjan and Mopuri, Konda Reddy},
journal = {Transactions on Machine Learning Research},
year = {2025},
url = {https://mlanthology.org/tmlr/2025/mohanty2025tmlr-coresetdriven/}
}