Post Training Mixed-Precision Quantization Based on Key Layers Selection
Abstract
Model quantization has been extensively used to compress and accelerate deep neural network inference. Because post-training quantization methods are simple to use, they have gained considerable attention. However, when the model is quantized below 8-bits, significant accuracy degradation will be involved. This paper seeks to address this problem by building mixed-precision inference networks based on key activation layers selection. In post training quantization process, key activation layers are quantized by 8-bit precision, and non-key activation layers are quantized by 4-bit precision. The experimental results indicate an impressive promotion with our method. Relative to ResNet-50(W8A8) and VGG-16(W8A8), our proposed method can accelerate inference with lower power consumption and a little accuracy loss.
Cite
Text
Liang. "Post Training Mixed-Precision Quantization Based on Key Layers Selection." European Conference on Computer Vision Workshops, 2020. doi:10.1007/978-3-030-68238-5_9Markdown
[Liang. "Post Training Mixed-Precision Quantization Based on Key Layers Selection." European Conference on Computer Vision Workshops, 2020.](https://mlanthology.org/eccvw/2020/liang2020eccvw-post/) doi:10.1007/978-3-030-68238-5_9BibTeX
@inproceedings{liang2020eccvw-post,
title = {{Post Training Mixed-Precision Quantization Based on Key Layers Selection}},
author = {Liang, Lingyan},
booktitle = {European Conference on Computer Vision Workshops},
year = {2020},
pages = {121-125},
doi = {10.1007/978-3-030-68238-5_9},
url = {https://mlanthology.org/eccvw/2020/liang2020eccvw-post/}
}