Model Preserving Compression for Neural Networks
Abstract
After training complex deep learning models, a common task is to compress the model to reduce compute and storage demands. When compressing, it is desirable to preserve the original model's per-example decisions (e.g., to go beyond top-1 accuracy or preserve robustness), maintain the network's structure, automatically determine per-layer compression levels, and eliminate the need for fine tuning. No existing compression methods simultaneously satisfy these criteria---we introduce a principled approach that does by leveraging interpolative decompositions. Our approach simultaneously selects and eliminates channels (analogously, neurons), then constructs an interpolation matrix that propagates a correction into the next layer, preserving the network's structure. Consequently, our method achieves good performance even without fine tuning and admits theoretical analysis. Our theoretical generalization bound for a one layer network lends itself naturally to a heuristic that allows our method to automatically choose per-layer sizes for deep networks. We demonstrate the efficacy of our approach with strong empirical performance on a variety of tasks, models, and datasets---from simple one-hidden-layer networks to deep networks on ImageNet.
Cite
Text
Chee et al. "Model Preserving Compression for Neural Networks." Neural Information Processing Systems, 2022.Markdown
[Chee et al. "Model Preserving Compression for Neural Networks." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/chee2022neurips-model/)BibTeX
@inproceedings{chee2022neurips-model,
title = {{Model Preserving Compression for Neural Networks}},
author = {Chee, Jerry and Renz), Megan Flynn (née and Damle, Anil and De Sa, Christopher M},
booktitle = {Neural Information Processing Systems},
year = {2022},
url = {https://mlanthology.org/neurips/2022/chee2022neurips-model/}
}