Understanding Diversity Based Neural Network Pruning in Teacher Student Setup

Abstract

Despite multitude of empirical advances, there is a lack of theoretical understanding of the effectiveness of different pruning methods. We inspect different pruning techniques under the statistical mechanics formulation of a teacher-student framework and derive their generalization error (GE) bounds. In the first part, we theoretically prove empirical observations of a recent work that showed Determinantal Point Process (DPP) based node pruning method is notably superior to competing approaches when tested on real datasets. In the second part, we use our theoretical setup to prove that the baseline random edge pruning method performs better than the DPP node pruning method, consistent with the finding in literature that sparse neural networks (edge pruned) generalize better than dense neural networks (node pruned) for a fixed number of parameters.

Cite

Text

Acharyya et al. "Understanding Diversity Based Neural Network Pruning in Teacher Student Setup." ICLR 2021 Workshops: Neural_Compression, 2021.

Markdown

[Acharyya et al. "Understanding Diversity Based Neural Network Pruning in Teacher Student Setup." ICLR 2021 Workshops: Neural_Compression, 2021.](https://mlanthology.org/iclrw/2021/acharyya2021iclrw-understanding/)

BibTeX

@inproceedings{acharyya2021iclrw-understanding,
  title     = {{Understanding Diversity Based Neural Network Pruning in Teacher Student Setup}},
  author    = {Acharyya, Rupam and Chattoraj, Ankani and Zhang, Boyu and Das, Shouman and Stefankovic, Daniel},
  booktitle = {ICLR 2021 Workshops: Neural_Compression},
  year      = {2021},
  url       = {https://mlanthology.org/iclrw/2021/acharyya2021iclrw-understanding/}
}