Uniform Noise Distribution and Compact Clusters: Unveiling the Success of Self-Supervised Learning in Label Noise
Abstract
Label noise is ubiquitous in real-world datasets, posing significant challenges to machine learning models. While self-supervised learning (SSL) algorithms have empirically demonstrated effectiveness in learning noisy labels, the theoretical understanding of their effectiveness remains underexplored. In this paper, we present a theoretical framework to understand how SSL methods enhance learning with noisy labels, especially for the instance-dependent label noise. We reveal that the uniform and compact cluster structures induced by contrastive SSL play a crucial role in mitigating the adverse effects of label noise. Specifically, we theoretically show that a classifier trained on SSL-learned representations significantly outperforms one trained using traditional supervised learning methods. This results from two key merits of SSL representations over label noise: 1. Uniform Noise Distribution: Label noise becomes uniformly distributed over SSL representations with respect to the true class labels, rather than the noisy ones, leading to an easier learning task. 2. Enhanced Cluster Structure: SSL enhances the formation of well-separated and compact categorical clusters, increasing inter-class distances while tightening intra-class clusters. We further theoretically justify the benefits of training a classifier on such structured representations, demonstrating that it encourages the classifier trained on noisy data to be aligned with the optimal classifier. Extensive experiments validate the robustness of SSL representations in combating label noise, confirming the practical values of our theoretical findings.
Cite
Text
Xu et al. "Uniform Noise Distribution and Compact Clusters: Unveiling the Success of Self-Supervised Learning in Label Noise." Transactions on Machine Learning Research, 2025.Markdown
[Xu et al. "Uniform Noise Distribution and Compact Clusters: Unveiling the Success of Self-Supervised Learning in Label Noise." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/xu2025tmlr-uniform/)BibTeX
@article{xu2025tmlr-uniform,
title = {{Uniform Noise Distribution and Compact Clusters: Unveiling the Success of Self-Supervised Learning in Label Noise}},
author = {Xu, Pengcheng and Yi, Li and Xu, Gezheng and Chen, Xi and McLeod, Ian and Ling, Charles and Wang, Boyu},
journal = {Transactions on Machine Learning Research},
year = {2025},
url = {https://mlanthology.org/tmlr/2025/xu2025tmlr-uniform/}
}