Differentiable Joint Pruning and Quantization for Hardware Efficiency
Abstract
We present a differentiable joint pruning and quantization (DJPQ) scheme. We frame neural network compression as a joint gradient-based optimization problem, trading off between model pruning and quantization automatically for hardware efficiency. DJPQ incorporates variational information bottleneck based structured pruning and mixed-bit precision quantization into a single differentiable loss function. While previous works always consider pruning and quantization separately, our method enables users to find the optimal trade-off between both in a single training procedure. To utilize the method for more efficient hardware inference, we extend DJPQ to integrate structured pruning with power-of-two bit-restricted quantization. We show that DJPQ significantly reduces the number of Bit-Operations (BOPs) for several networks while maintaining the top-1 accuracy of original floating-point models (e.g.,53x BOPs reduction in ResNet18 on ImageNet, 43x in MobileNetV2). Compared to the conventional two-stage approach, which optimizes pruning and quantization independently, our scheme outperforms in terms of both accuracy and BOPs. Even when considering bit-restricted quantization, DJPQ achieves larger compression ratios and better accuracy than the two-stage approach.
Cite
Text
Wang et al. "Differentiable Joint Pruning and Quantization for Hardware Efficiency." Proceedings of the European Conference on Computer Vision (ECCV), 2020. doi:10.1007/978-3-030-58526-6_16Markdown
[Wang et al. "Differentiable Joint Pruning and Quantization for Hardware Efficiency." Proceedings of the European Conference on Computer Vision (ECCV), 2020.](https://mlanthology.org/eccv/2020/wang2020eccv-differentiable/) doi:10.1007/978-3-030-58526-6_16BibTeX
@inproceedings{wang2020eccv-differentiable,
title = {{Differentiable Joint Pruning and Quantization for Hardware Efficiency}},
author = {Wang, Ying and Lu, Yadong and Blankevoort, Tijmen},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2020},
doi = {10.1007/978-3-030-58526-6_16},
url = {https://mlanthology.org/eccv/2020/wang2020eccv-differentiable/}
}