Leveraging Unlabeled Data to Track Memorization

Abstract

Deep neural networks may easily memorize noisy labels present in real-world data, which degrades their ability to generalize. It is therefore important to track and evaluate the robustness of models against noisy label memorization. We propose a metric, called $\textit{susceptibility}$, to gauge such memorization for neural networks. Susceptibility is simple and easy to compute during training. Moreover, it does not require access to ground-truth labels and it only uses unlabeled data. We empirically show the effectiveness of our metric in tracking memorization on various architectures and datasets and provide theoretical insights into the design of the susceptibility metric. Finally, we show through extensive experiments on datasets with synthetic and real-world label noise that one can utilize susceptibility and the overall training accuracy to distinguish models that maintain a low memorization on the training set and generalize well to unseen clean data.

Cite

Text

Forouzesh et al. "Leveraging Unlabeled Data to Track Memorization." International Conference on Learning Representations, 2023.

Markdown

[Forouzesh et al. "Leveraging Unlabeled Data to Track Memorization." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/forouzesh2023iclr-leveraging/)

BibTeX

@inproceedings{forouzesh2023iclr-leveraging,
  title     = {{Leveraging Unlabeled Data to Track Memorization}},
  author    = {Forouzesh, Mahsa and Sedghi, Hanie and Thiran, Patrick},
  booktitle = {International Conference on Learning Representations},
  year      = {2023},
  url       = {https://mlanthology.org/iclr/2023/forouzesh2023iclr-leveraging/}
}