Non-Uniform Step Size Quantization for Accurate Post-Training Quantization

Abstract

Quantization is a very effective optimization technique to reduce hardware cost and memory footprint of deep neural network (DNN) accelerators. In particular, post-training quantization (PTQ) is often preferred as it does not require a full dataset or costly retraining. However, performance of PTQ lags significantly behind that of quantization-aware training especially for low-precision networks (<= 4-bit). In this paper we propose a novel PTQ scheme to bridge the gap, with minimal impact on hardware cost. The main idea of our scheme is to increase arithmetic precision while retaining the same representational precision. The excess arithmetic precision enables us to better match the input data distribution while also presenting a new optimization problem, to which we propose a novel search-based solution. Our scheme is based on logarithmic-scale quantization, which can help reduce hardware cost through the use of shifters instead of multipliers. Our evaluation results using various DNN models on challenging computer vision tasks (image classification, object detection, semantic segmentation) show superior accuracy compared with the state-of-the-art PTQ methods at various low-bit precisions.

Cite

Text

Oh et al. "Non-Uniform Step Size Quantization for Accurate Post-Training Quantization." Proceedings of the European Conference on Computer Vision (ECCV), 2022. doi:10.1007/978-3-031-20083-0_39

Markdown

[Oh et al. "Non-Uniform Step Size Quantization for Accurate Post-Training Quantization." Proceedings of the European Conference on Computer Vision (ECCV), 2022.](https://mlanthology.org/eccv/2022/oh2022eccv-nonuniform/) doi:10.1007/978-3-031-20083-0_39

BibTeX

@inproceedings{oh2022eccv-nonuniform,
  title     = {{Non-Uniform Step Size Quantization for Accurate Post-Training Quantization}},
  author    = {Oh, Sangyun and Sim, Hyeonuk and Kim, Jounghyun and Lee, Jongeun},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2022},
  doi       = {10.1007/978-3-031-20083-0_39},
  url       = {https://mlanthology.org/eccv/2022/oh2022eccv-nonuniform/}
}