Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization

Abstract

Quantization is a widely used technique to compress and accelerate deep neural networks. However, conventional quantization methods use the same bit-width for all (or most of) the layers, which often suffer significant accuracy degradation in the ultra-low precision regime and ignore the fact that emergent hardware accelerators begin to support mixed-precision computation. Consequently, we present a novel and principled framework to solve the mixed-precision quantization problem in this paper. Briefly speaking, we first formulate the mixed-precision quantization as a discrete constrained optimization problem. Then, to make the optimization tractable, we approximate the objective function with second-order Taylor expansion and propose an efficient approach to compute its Hessian matrix. Finally, based on the above simplification, we show that the original problem can be reformulated as a Multiple Choice Knapsack Problem (MCKP) and propose a greedy search algorithm to solve it efficiently. Compared with existing mixed-precision quantization works, our method is derived in a principled way and much more computationally efficient. Moreover, extensive experiments conducted on the ImageNet dataset and various kinds of network architectures also demonstrate its superiority over existing uniform and mixed-precision quantization approaches.

Cite

Text

Chen et al. "Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.00530

Markdown

[Chen et al. "Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/chen2021iccv-mixedprecision/) doi:10.1109/ICCV48922.2021.00530

BibTeX

@inproceedings{chen2021iccv-mixedprecision,
  title     = {{Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization}},
  author    = {Chen, Weihan and Wang, Peisong and Cheng, Jian},
  booktitle = {International Conference on Computer Vision},
  year      = {2021},
  pages     = {5350-5359},
  doi       = {10.1109/ICCV48922.2021.00530},
  url       = {https://mlanthology.org/iccv/2021/chen2021iccv-mixedprecision/}
}