HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision

Dong, Zhen; Yao, Zhewei; Gholami, Amir; Mahoney, Michael W.; Keutzer, Kurt

doi:10.1109/ICCV.2019.00038

HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision

Zhen Dong, Zhewei Yao, Amir Gholami, Michael W. Mahoney, Kurt Keutzer

ICCV 2019

doi:10.1109/ICCV.2019.00038 /iccv/2019/dong2019iccv-hawq/

Abstract

Model size and inference speed/power have become a major challenge in the deployment of neural networks for many applications. A promising approach to address these problems is quantization. However, uniformly quantizing a model to ultra-low precision leads to significant accuracy degradation. A novel solution for this is to use mixed-precision quantization, as some parts of the network may allow lower precision as compared to other layers. However, there is no systematic way to determine the precision of different layers. A brute force approach is not feasible for deep networks, as the search space for mixed-precision is exponential in the number of layers. Another challenge is a similar factorial complexity for determining block-wise fine-tuning order when quantizing the model to a target precision. Here, we introduce Hessian AWare Quantization (HAWQ), a novel second-order quantization method to address these problems. HAWQ allows for the automatic selection of the relative quantization precision of each layer, based on the layer's Hessian spectrum. Moreover, HAWQ provides a deterministic fine-tuning order for quantizing layers. We show the results of our method on Cifar-10 using ResNet20, and on ImageNet using Inception-V3, ResNet50 and SqueezeNext models. Comparing HAWQ with state-of-the-art shows that we can achieve similar/better accuracy with 8x activation compression ratio on ResNet20, as compared to DNAS, and up to 1% higher accuracy with up to 14% smaller models on ResNet50 and Inception-V3, compared to recently proposed methods of RVQuant and HAQ. Furthermore, we show that we can quantize SqueezeNext to just 1MB model size while achieving above 68% top1 accuracy on ImageNet.

PDF ICCV Semantic Scholar

Cite

Text

Dong et al. "HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision." Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019. doi:10.1109/ICCV.2019.00038

Markdown

[Dong et al. "HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision." Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.](https://mlanthology.org/iccv/2019/dong2019iccv-hawq/) doi:10.1109/ICCV.2019.00038

BibTeX

@inproceedings{dong2019iccv-hawq,
  title     = {{HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision}},
  author    = {Dong, Zhen and Yao, Zhewei and Gholami, Amir and Mahoney, Michael W. and Keutzer, Kurt},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year      = {2019},
  doi       = {10.1109/ICCV.2019.00038},
  url       = {https://mlanthology.org/iccv/2019/dong2019iccv-hawq/}
}