Network Pruning at Scale: A Discrete Optimization Approach
Abstract
Due to the ever-growing size of neural network models, there has been an emerging interest in compressing (i.e., pruning) neural networks by sparsifying weights in a pre-trained neural network, while maintaining the performance of dense model as much as possible. In this work, we focus on a neural network pruning framework based on local quadratic models of the loss function. We present an optimization-based approach with an $\ell_0$-regression formulation, and propose novel algorithms to obtain good solutions to the combinatorial optimization problem. In practice, our basic (single-stage) approach, based on one local quadratic model approximation, is up to $10^3$ times faster than existing methods while achieving similar accuracy. We also propose a multi-stage method that outperforms other methods in terms of accuracy for a given sparsity constraint while remaining computationally efficient. In particular, our approach results in a 98\% sparse (i.e., 98\% of weights in dense model are set to zero) MLPNet with 90\% test accuracy (i.e., 3\% reduction in test accuracy compared to the dense model), which is an improvement over the previous best accuracy (55\%).
Cite
Text
Chen et al. "Network Pruning at Scale: A Discrete Optimization Approach." NeurIPS 2022 Workshops: OPT, 2022.Markdown
[Chen et al. "Network Pruning at Scale: A Discrete Optimization Approach." NeurIPS 2022 Workshops: OPT, 2022.](https://mlanthology.org/neuripsw/2022/chen2022neuripsw-network/)BibTeX
@inproceedings{chen2022neuripsw-network,
title = {{Network Pruning at Scale: A Discrete Optimization Approach}},
author = {Chen, Wenyu and Benbaki, Riade and Meng, Xiang and Mazumder, Rahul},
booktitle = {NeurIPS 2022 Workshops: OPT},
year = {2022},
url = {https://mlanthology.org/neuripsw/2022/chen2022neuripsw-network/}
}