Greedy Search Algorithm for Mixed Precision in Post-Training Quantization of Convolutional Neural Network Inspired by Submodular Optimization

Satoki, Tsuji; Hiroshi, Kawaguchi; Atsuki, Inoue; Yasufumi, Sakai; Fuyuka, Yamada

Greedy Search Algorithm for Mixed Precision in Post-Training Quantization of Convolutional Neural Network Inspired by Submodular Optimization

Tsuji Satoki, Kawaguchi Hiroshi, Inoue Atsuki, Sakai Yasufumi, Yamada Fuyuka

ACML 2021 pp. 886-901

/acml/2021/satoki2021acml-greedy/

Abstract

For lower bit-widths such as less than 8-bit, many quantization strategies include re-training in order to recover accuracy degradation. However, the re-training works against rapid deployment for wide distribution of quantized models. Therefore, post-training quantization has been getting more attention in recent years. In one example, partial quantization according to the layer sensitivity based on the accuracy after each quantization has been proposed; however, the effects of one layer quantization on the other layers has not taken into account. To further reduce the accuracy degradation, we propose a quantization scheme that considers the effects by continuously updating the accuracy after each layer quantization. Additionally, for more data compression, we extend that scheme to mixed precision, which applies a layer-by-layer fitted bit-width. Since the search space for bit allocation per layer increases exponentially with the number of layers $N$, Existing methods require computationally intensive approach such as network training. Here, we derive practical solutions to the bit allocation problem in polynomial time $O(N^2)$ using a deterministic greedy search algorithm inspired by submodular optimization without any training. For example, the proposed algorithm completes a search on ResNet18 for ImageNet in 1 hour for a single GPU. Compared to the case without updating the layer sensitivity, our method improves the accuracy of the quantized model by more than 1% with multiple convolutional neural networks. For examples, 6-bit quantization of MobileNetV2 achieves 80.1% reduction of model size with -1.10% accuracy degradation. 4-bit quantization of ResNet50 achieves 82.9% size reduction with -0.194% accuracy degradation. Furthermore, results show that the proposed method reduces the accuracy degradation by more than about 0.7% compared to various latest post-training quantization strategies.

PDF ACML Semantic Scholar

Cite

Text

Satoki et al. "Greedy Search Algorithm for Mixed Precision in Post-Training Quantization of Convolutional Neural Network Inspired by Submodular Optimization." Proceedings of The 13th Asian Conference on Machine Learning, 2021.

Markdown

[Satoki et al. "Greedy Search Algorithm for Mixed Precision in Post-Training Quantization of Convolutional Neural Network Inspired by Submodular Optimization." Proceedings of The 13th Asian Conference on Machine Learning, 2021.](https://mlanthology.org/acml/2021/satoki2021acml-greedy/)

BibTeX

@inproceedings{satoki2021acml-greedy,
  title     = {{Greedy Search Algorithm for Mixed Precision in Post-Training Quantization of Convolutional Neural Network Inspired by Submodular Optimization}},
  author    = {Satoki, Tsuji and Hiroshi, Kawaguchi and Atsuki, Inoue and Yasufumi, Sakai and Fuyuka, Yamada},
  booktitle = {Proceedings of The 13th Asian Conference on Machine Learning},
  year      = {2021},
  pages     = {886-901},
  volume    = {157},
  url       = {https://mlanthology.org/acml/2021/satoki2021acml-greedy/}
}