QuEST: Low-Bit Diffusion Model Quantization via Efficient Selective Finetuning

Abstract

The practical deployment of diffusion models is still hindered by the high memory and computational overhead. Although quantization paves a way for model compression and acceleration, existing methods face challenges in achieving low-bit quantization efficiently. In this paper, we identify imbalanced activation distributions as a primary source of quantization difficulty, and propose to adjust these distributions through weight finetuning to be more quantization-friendly. We provide both theoretical and empirical evidence supporting finetuning as a practical and reliable solution. Building on this approach, we further distinguish two critical types of quantized layers: those responsible for retaining essential temporal information and those particularly sensitive to bit-width reduction. By selectively finetuning these layers under both local and global supervision, we mitigate performance degradation while enhancing quantization efficiency. Our method demonstrates its efficacy across three high-resolution image generation tasks, obtaining state-of-the-art performance across multiple bit-width settings.

Cite

Text

Wang et al. "QuEST: Low-Bit Diffusion Model Quantization via Efficient Selective Finetuning." International Conference on Computer Vision, 2025.

Markdown

[Wang et al. "QuEST: Low-Bit Diffusion Model Quantization via Efficient Selective Finetuning." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/wang2025iccv-quest/)

BibTeX

@inproceedings{wang2025iccv-quest,
  title     = {{QuEST: Low-Bit Diffusion Model Quantization via Efficient Selective Finetuning}},
  author    = {Wang, Haoxuan and Shang, Yuzhang and Yuan, Zhihang and Wu, Junyi and Yan, Junchi and Yan, Yan},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {15542-15551},
  url       = {https://mlanthology.org/iccv/2025/wang2025iccv-quest/}
}