One-Shot Model for Mixed-Precision Quantization

Abstract

Neural network quantization is a popular approach for model compression. Modern hardware supports quantization in mixed-precision mode, which allows for greater compression rates but adds the challenging task of searching for the optimal bit width. The majority of existing searchers find a single mixed-precision architecture. To select an architecture that is suitable in terms of performance and resource consumption, one has to restart searching multiple times. We focus on a specific class of methods that find tensor bit width using gradient-based optimization. First, we theoretically derive several methods that were empirically proposed earlier. Second, we present a novel One-Shot method that finds a diverse set of Pareto-front architectures in O(1) time. For large models, the proposed method is 5 times more efficient than existing methods. We verify the method on two classification and super-resolution models and show above 0.93 correlation score between the predicted and actual model performance. The Pareto-front architecture selection is straightforward and takes only 20 to 40 supernet evaluations, which is the new state-of-the-art result to the best of our knowledge.

Cite

Text

Koryakovskiy et al. "One-Shot Model for Mixed-Precision Quantization." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.00767

Markdown

[Koryakovskiy et al. "One-Shot Model for Mixed-Precision Quantization." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/koryakovskiy2023cvpr-oneshot/) doi:10.1109/CVPR52729.2023.00767

BibTeX

@inproceedings{koryakovskiy2023cvpr-oneshot,
  title     = {{One-Shot Model for Mixed-Precision Quantization}},
  author    = {Koryakovskiy, Ivan and Yakovleva, Alexandra and Buchnev, Valentin and Isaev, Temur and Odinokikh, Gleb},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {7939-7949},
  doi       = {10.1109/CVPR52729.2023.00767},
  url       = {https://mlanthology.org/cvpr/2023/koryakovskiy2023cvpr-oneshot/}
}