Coreset-Driven Re-Labeling: Tackling Noisy Annotations with Noise-Free Gradients

Saumyaranjan Mohanty, Konda Reddy Mopuri

TMLR 2025

/tmlr/2025/mohanty2025tmlr-coresetdriven/

Abstract

Large-scale datasets invariably contain annotation noise. Re-labeling methods have been developed to handle annotation noise in large-scale datasets. Though various methodologies to alleviate annotation noise have been developed, these are particularly time-consuming and computationally intensive. The requirement of high computational power and longer time duration can be drastically reduced by selecting a representative coreset. In this work, we adapt a noise-free gradient-based coreset selection method towards re-labeling applications for noisy datasets with erroneous labels. We introduce ‘confidence score’ to the coreset selection method to cater for the presence of noisy labels. Through extensive evaluation over CIFAR-100N, Web Vision, and ImageNet-1K Datasets, we demonstrate that our method outperforms the SOTA coreset selection for re-labeling methods (DivideMix and SOP+). We have provided the codebase at URL.

PDF TMLR Code Semantic Scholar

Cite

Text

Mohanty and Mopuri. "Coreset-Driven Re-Labeling: Tackling Noisy Annotations with Noise-Free Gradients." Transactions on Machine Learning Research, 2025.

Markdown

[Mohanty and Mopuri. "Coreset-Driven Re-Labeling: Tackling Noisy Annotations with Noise-Free Gradients." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/mohanty2025tmlr-coresetdriven/)

BibTeX

@article{mohanty2025tmlr-coresetdriven,
  title     = {{Coreset-Driven Re-Labeling: Tackling Noisy Annotations with Noise-Free Gradients}},
  author    = {Mohanty, Saumyaranjan and Mopuri, Konda Reddy},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/mohanty2025tmlr-coresetdriven/}
}