Does ‘Deep Learning on a Data Diet’ Reproduce? Overall Yes, but GraNd at Initialization Does Not

Abstract

Training deep neural networks on vast datasets often results in substantial computational demands, underscoring the need for efficient data pruning. In this context, we critically re-evaluate the data pruning metrics introduced in `Deep Learning on a Data Diet' by Paul et al. (2021): the Gradient Norm (GraNd) (at initialization) and the Error L2 Norm (EL2N). Our analysis uncovers a strong correlation between the GraNd scores at initialization and a sample's input norm, suggesting the latter as a potential baseline for data pruning. However, comprehensive tests on CIFAR-10 show neither metric outperforming random pruning, contradicting one of the findings in Paul et al. (2021). We pinpoint the inconsistency in the GraNd at initialization results to a later-fixed bug in FLAX's checkpoint restoring mechanism (https://github.com/google/flax/commit/28fbd95500f4bf2f9924d2560062fa50e919b1a5). Altogether, our findings do not support using the input norm or GraNd scores at initialization for effective data pruning. Nevertheless, EL2N and GraNd scores at later training epochs do provide useful pruning signals, aligning with the expected performance.

Cite

Text

Kirsch. "Does ‘Deep Learning on a Data Diet’ Reproduce? Overall Yes, but GraNd at Initialization Does Not." Transactions on Machine Learning Research, 2023.

Markdown

[Kirsch. "Does ‘Deep Learning on a Data Diet’ Reproduce? Overall Yes, but GraNd at Initialization Does Not." Transactions on Machine Learning Research, 2023.](https://mlanthology.org/tmlr/2023/kirsch2023tmlr-deep/)

BibTeX

@article{kirsch2023tmlr-deep,
  title     = {{Does ‘Deep Learning on a Data Diet’ Reproduce? Overall Yes, but GraNd at Initialization Does Not}},
  author    = {Kirsch, Andreas},
  journal   = {Transactions on Machine Learning Research},
  year      = {2023},
  url       = {https://mlanthology.org/tmlr/2023/kirsch2023tmlr-deep/}
}