Post Training Mixed-Precision Quantization Based on Key Layers Selection

Abstract

Model quantization has been extensively used to compress and accelerate deep neural network inference. Because post-training quantization methods are simple to use, they have gained considerable attention. However, when the model is quantized below 8-bits, significant accuracy degradation will be involved. This paper seeks to address this problem by building mixed-precision inference networks based on key activation layers selection. In post training quantization process, key activation layers are quantized by 8-bit precision, and non-key activation layers are quantized by 4-bit precision. The experimental results indicate an impressive promotion with our method. Relative to ResNet-50(W8A8) and VGG-16(W8A8), our proposed method can accelerate inference with lower power consumption and a little accuracy loss.

Cite

Text

Liang. "Post Training Mixed-Precision Quantization Based on Key Layers Selection." European Conference on Computer Vision Workshops, 2020. doi:10.1007/978-3-030-68238-5_9

Markdown

[Liang. "Post Training Mixed-Precision Quantization Based on Key Layers Selection." European Conference on Computer Vision Workshops, 2020.](https://mlanthology.org/eccvw/2020/liang2020eccvw-post/) doi:10.1007/978-3-030-68238-5_9

BibTeX

@inproceedings{liang2020eccvw-post,
  title     = {{Post Training Mixed-Precision Quantization Based on Key Layers Selection}},
  author    = {Liang, Lingyan},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2020},
  pages     = {121-125},
  doi       = {10.1007/978-3-030-68238-5_9},
  url       = {https://mlanthology.org/eccvw/2020/liang2020eccvw-post/}
}