Greedy Search Algorithm for Mixed Precision in Post-Training Quantization of Convolutional Neural Network Inspired by Submodular Optimization
Abstract
For lower bit-widths such as less than 8-bit, many quantization strategies include re-training in order to recover accuracy degradation. However, the re-training works against rapid deployment for wide distribution of quantized models. Therefore, post-training quantization has been getting more attention in recent years. In one example, partial quantization according to the layer sensitivity based on the accuracy after each quantization has been proposed; however, the effects of one layer quantization on the other layers has not taken into account. To further reduce the accuracy degradation, we propose a quantization scheme that considers the effects by continuously updating the accuracy after each layer quantization. Additionally, for more data compression, we extend that scheme to mixed precision, which applies a layer-by-layer fitted bit-width. Since the search space for bit allocation per layer increases exponentially with the number of layers $N$, Existing methods require computationally intensive approach such as network training. Here, we derive practical solutions to the bit allocation problem in polynomial time $O(N^2)$ using a deterministic greedy search algorithm inspired by submodular optimization without any training. For example, the proposed algorithm completes a search on ResNet18 for ImageNet in 1 hour for a single GPU. Compared to the case without updating the layer sensitivity, our method improves the accuracy of the quantized model by more than 1% with multiple convolutional neural networks. For examples, 6-bit quantization of MobileNetV2 achieves 80.1% reduction of model size with -1.10% accuracy degradation. 4-bit quantization of ResNet50 achieves 82.9% size reduction with -0.194% accuracy degradation. Furthermore, results show that the proposed method reduces the accuracy degradation by more than about 0.7% compared to various latest post-training quantization strategies.
Cite
Text
Satoki et al. "Greedy Search Algorithm for Mixed Precision in Post-Training Quantization of Convolutional Neural Network Inspired by Submodular Optimization." Proceedings of The 13th Asian Conference on Machine Learning, 2021.Markdown
[Satoki et al. "Greedy Search Algorithm for Mixed Precision in Post-Training Quantization of Convolutional Neural Network Inspired by Submodular Optimization." Proceedings of The 13th Asian Conference on Machine Learning, 2021.](https://mlanthology.org/acml/2021/satoki2021acml-greedy/)BibTeX
@inproceedings{satoki2021acml-greedy,
title = {{Greedy Search Algorithm for Mixed Precision in Post-Training Quantization of Convolutional Neural Network Inspired by Submodular Optimization}},
author = {Satoki, Tsuji and Hiroshi, Kawaguchi and Atsuki, Inoue and Yasufumi, Sakai and Fuyuka, Yamada},
booktitle = {Proceedings of The 13th Asian Conference on Machine Learning},
year = {2021},
pages = {886-901},
volume = {157},
url = {https://mlanthology.org/acml/2021/satoki2021acml-greedy/}
}