Semantic Alignment and Reinforcement for Data-Free Quantization of Vision Transformers

Abstract

Data-free quantization (DFQ) enables model quantization without accessing real data, addressing concerns regarding data security and privacy. With the growing adoption of Vision Transformers (ViTs), DFQ for ViTs has garnered significant attention. However, existing DFQ methods exhibit two limitations: (1) semantic distortion, where the semantics of synthetic images deviate substantially from those of real images, and (2) semantic inadequacy, where synthetic images contain extensive regions with limited content and oversimplified textures, leading to suboptimal quantization performance. To address these limitations, we propose SARDFQ, a novel Semantics Alignment and Reinforcement Data-Free Quantization method for ViTs. To address semantic distortion, SARDFQ incorporates Attention Priors Alignment (APA), which optimizes synthetic images to follow randomly generated structure attention priors. To mitigate semantic inadequacy, SARDFQ introduces Multi-Semantic Reinforcement (MSR), leveraging localized patch optimization to enhance semantic richness across synthetic images. Furthermore, SARDFQ employs Soft-Label Learning (SL), wherein multiple semantic targets are adapted to facilitate the learning of multi-semantic images augmented by MSR. Extensive experiments demonstrate the effectiveness of SARDFQ, significantly surpassing existing methods. For example, SARDFQ improves top-1 accuracy on ImageNet by 15.52% for W4A4 ViT-B

Cite

Text

Zhong et al. "Semantic Alignment and Reinforcement for Data-Free Quantization of Vision Transformers." International Conference on Computer Vision, 2025.

Markdown

[Zhong et al. "Semantic Alignment and Reinforcement for Data-Free Quantization of Vision Transformers." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/zhong2025iccv-semantic/)

BibTeX

@inproceedings{zhong2025iccv-semantic,
  title     = {{Semantic Alignment and Reinforcement for Data-Free Quantization of Vision Transformers}},
  author    = {Zhong, Yunshan and Zhou, Yuyao and Zhang, Yuxin and Sui, Wanchen and Li, Shen and Li, Yong and Chao, Fei and Ji, Rongrong},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {12479-12490},
  url       = {https://mlanthology.org/iccv/2025/zhong2025iccv-semantic/}
}