DSD²: Can We Dodge Sparse Double Descent and Compress the Neural Network Worry-Free?

Abstract

Neoteric works have shown that modern deep learning models can exhibit a sparse double descent phenomenon. Indeed, as the sparsity of the model increases, the test performance first worsens since the model is overfitting the training data; then, the overfitting reduces, leading to an improvement in performance, and finally, the model begins to forget critical information, resulting in underfitting. Such a behavior prevents using traditional early stop criteria. In this work, we have three key contributions. First, we propose a learning framework that avoids such a phenomenon and improves generalization. Second, we introduce an entropy measure providing more insights into the insurgence of this phenomenon and enabling the use of traditional stop criteria. Third, we provide a comprehensive quantitative analysis of contingent factors such as re-initialization methods, model width and depth, and dataset noise. The contributions are supported by empirical evidence in typical setups. Our code is available at https://github.com/VGCQ/DSD2.

Cite

Text

Quétu and Tartaglione. "DSD²: Can We Dodge Sparse Double Descent and Compress the Neural Network Worry-Free?." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I13.29393

Markdown

[Quétu and Tartaglione. "DSD²: Can We Dodge Sparse Double Descent and Compress the Neural Network Worry-Free?." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/quetu2024aaai-dsd/) doi:10.1609/AAAI.V38I13.29393

BibTeX

@inproceedings{quetu2024aaai-dsd,
  title     = {{DSD²: Can We Dodge Sparse Double Descent and Compress the Neural Network Worry-Free?}},
  author    = {Quétu, Victor and Tartaglione, Enzo},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {14749-14757},
  doi       = {10.1609/AAAI.V38I13.29393},
  url       = {https://mlanthology.org/aaai/2024/quetu2024aaai-dsd/}
}