CLIP-Q: Deep Network Compression Learning by In-Parallel Pruning-Quantization

Abstract

Deep neural networks enable state-of-the-art accuracy on visual recognition tasks such as image classification and object detection. However, modern deep networks contain millions of learned weights; a more efficient utilization of computation resources would assist in a variety of deployment scenarios, from embedded platforms with resource constraints to computing clusters running ensembles of networks. In this paper, we combine network pruning and weight quantization in a single learning framework that performs pruning and quantization jointly, and in parallel with fine-tuning. This allows us to take advantage of the complementary nature of pruning and quantization and to recover from premature pruning errors, which is not possible with current two-stage approaches. Our proposed CLIP-Q method (Compression Learning by In-Parallel Pruning-Quantization) compresses AlexNet by 51-fold, GoogLeNet by 10-fold, and ResNet-50 by 15-fold, while preserving the uncompressed network accuracies on ImageNet.

Cite

Text

Tung and Mori. "CLIP-Q: Deep Network Compression Learning by In-Parallel Pruning-Quantization." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. doi:10.1109/CVPR.2018.00821

Markdown

[Tung and Mori. "CLIP-Q: Deep Network Compression Learning by In-Parallel Pruning-Quantization." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.](https://mlanthology.org/cvpr/2018/tung2018cvpr-clipq/) doi:10.1109/CVPR.2018.00821

BibTeX

@inproceedings{tung2018cvpr-clipq,
  title     = {{CLIP-Q: Deep Network Compression Learning by In-Parallel Pruning-Quantization}},
  author    = {Tung, Frederick and Mori, Greg},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2018},
  doi       = {10.1109/CVPR.2018.00821},
  url       = {https://mlanthology.org/cvpr/2018/tung2018cvpr-clipq/}
}