QVGen: Pushing the Limit of Quantized Video Generative Models

Huang, Yushi; Gong, Ruihao; Liu, Jing; Ding, Yifu; Lv, Chengtao; Qin, Haotong; Zhang, Jun

QVGen: Pushing the Limit of Quantized Video Generative Models

Yushi Huang, Ruihao Gong, Jing Liu, Yifu Ding, Chengtao Lv, Haotong Qin, Jun Zhang

ICLR 2026

/iclr/2026/huang2026iclr-qvgen/

Abstract

Video diffusion models (DMs) have enabled high-quality video synthesis. Yet, their substantial computational and memory demands pose serious challenges to real-world deployment, even on high-end GPUs. As a commonly adopted solution, quantization has proven notable success in reducing cost for image DMs, while its direct application to video DMs remains ineffective. In this paper, we present *QVGen*, a novel quantization-aware training (QAT) framework tailored for high-performance and inference-efficient video DMs under extremely low-bit quantization (*e.g.*, $4$-bit or below). We begin with a theoretical analysis demonstrating that reducing the gradient norm is essential to facilitate convergence for QAT. To this end, we introduce auxiliary modules ($\Phi$) to mitigate large quantization errors, leading to significantly enhanced convergence. To eliminate the inference overhead of $\Phi$, we propose a *rank-decay* strategy that progressively eliminates $\Phi$. Specifically, we repeatedly employ singular value decomposition (SVD) and a proposed rank-based regularization $\mathbf{\gamma}$ to identify and decay low-contributing components. This strategy retains performance while zeroing out additional inference overhead. Extensive experiments across $4$ state-of-the-art (SOTA) video DMs, with parameter sizes ranging from $1.3\text{B}\sim14\text{B}$, show that QVGen is *the first* to reach full-precision comparable quality under $4$-bit settings. Moreover, it significantly outperforms existing methods. For instance, our $3$-bit CogVideoX-2B achieves improvements of $+25.28$ in Dynamic Degree and $+8.43$ in Scene Consistency on VBench. Code and models are available at https://github.com/ModelTC/QVGen.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Huang et al. "QVGen: Pushing the Limit of Quantized Video Generative Models." International Conference on Learning Representations, 2026.

Markdown

[Huang et al. "QVGen: Pushing the Limit of Quantized Video Generative Models." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/huang2026iclr-qvgen/)

BibTeX

@inproceedings{huang2026iclr-qvgen,
  title     = {{QVGen: Pushing the Limit of Quantized Video Generative Models}},
  author    = {Huang, Yushi and Gong, Ruihao and Liu, Jing and Ding, Yifu and Lv, Chengtao and Qin, Haotong and Zhang, Jun},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/huang2026iclr-qvgen/}
}