SynQ: Accurate Zero-Shot Quantization by Synthesis-Aware Fine-Tuning

Abstract

How can we accurately quantize a pre-trained model without any data? Quantization algorithms are widely used for deploying neural networks on resource-constrained edge devices. Zero-shot Quantization (ZSQ) addresses the crucial and practical scenario where training data are inaccessible for privacy or security reasons. However, three significant challenges hinder the performance of existing ZSQ methods: 1) noise in the synthetic dataset, 2) predictions based on off-target patterns, and the 3) misguidance by erroneous hard labels. In this paper, we propose SynQ (Synthesis-aware Fine-tuning for Zero-shot Quantization), a carefully designed ZSQ framework to overcome the limitations of existing methods. SynQ minimizes the noise from the generated samples by exploiting a low-pass filter. Then, SynQ trains the quantized model to improve accuracy by aligning its class activation map with the pre-trained model. Furthermore, SynQ mitigates misguidance from the pre-trained model's error by leveraging only soft labels for difficult samples. Extensive experiments show that SynQ provides the state-of-the-art accuracy, over existing ZSQ methods.

Cite

Text

Kim et al. "SynQ: Accurate Zero-Shot Quantization by Synthesis-Aware Fine-Tuning." International Conference on Learning Representations, 2025.

Markdown

[Kim et al. "SynQ: Accurate Zero-Shot Quantization by Synthesis-Aware Fine-Tuning." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/kim2025iclr-synq/)

BibTeX

@inproceedings{kim2025iclr-synq,
  title     = {{SynQ: Accurate Zero-Shot Quantization by Synthesis-Aware Fine-Tuning}},
  author    = {Kim, Minjun and Kim, Jongjin and Kang, U},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/kim2025iclr-synq/}
}