A Random Matrix Analysis of Learning with Noisy Labels

Abstract

This paper provides theoretical insights into high-dimensional binary classification with class-conditional noisy labels. Specifically, we study the behavior of a linear classifier with a label noisiness aware loss function, when both the dimension of data $p$ and the sample size $n$ are large and comparable. Relying on random matrix theory by supposing a Gaussian mixture data model, the performance of the linear classifier when $p,n\to \infty$ is shown to converge towards a limit, involving scalar statistics of the data. Importantly, our findings show that the low-dimensional intuitions to handle label noise do not hold in high-dimension, in the sense that the optimal classifier in low-dimension dramatically fails in high-dimension. Based on our derivations, we design an optimized method that is shown to be provably more efficient in handling noisy labels in high dimensions. Our theoretical conclusions are further confirmed by experiments on real datasets, where we show that our optimized approach outperforms the considered baselines.

Cite

Text

El Firdoussi and Seddik. "A Random Matrix Analysis of Learning with Noisy Labels." ICML 2024 Workshops: HiLD, 2024.

Markdown

[El Firdoussi and Seddik. "A Random Matrix Analysis of Learning with Noisy Labels." ICML 2024 Workshops: HiLD, 2024.](https://mlanthology.org/icmlw/2024/firdoussi2024icmlw-random/)

BibTeX

@inproceedings{firdoussi2024icmlw-random,
  title     = {{A Random Matrix Analysis of Learning with Noisy Labels}},
  author    = {El Firdoussi, Aymane and Seddik, Mohamed El Amine},
  booktitle = {ICML 2024 Workshops: HiLD},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/firdoussi2024icmlw-random/}
}