QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation

Abstract

Recently, Diffusion Transformers (DiTs) have emerged as a dominant architecture in video generation, surpassing U-Net-based models in terms of performance. However, the enhanced capabilities of DiTs come with significant drawbacks, including increased computational and memory costs, which hinder their deployment on resource-constrained devices. Current acceleration techniques, such as quantization and cache mechanism, offer limited speedup and are often applied in isolation, failing to fully address the complexities of DiT architectures. In this paper, we propose QuantCache, a novel training-free inference acceleration framework that jointly optimizes hierarchical latent caching, adaptive importance-guided quantization, and structural redundancy-aware pruning. QuantCache achieves an end-to-end latency speedup of 6.72x on Open-Sora with minimal loss in generation quality. Extensive evaluations across multiple video generation benchmarks demonstrate the effectiveness of our method, setting a new standard for efficient DiT inference. We will release all code and models to facilitate further research.

Cite

Text

Wu et al. "QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation." International Conference on Computer Vision, 2025.

Markdown

[Wu et al. "QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/wu2025iccv-quantcache/)

BibTeX

@inproceedings{wu2025iccv-quantcache,
  title     = {{QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation}},
  author    = {Wu, Junyi and Li, Zhiteng and Hui, Zheng and Zhang, Yulun and Kong, Linghe and Yang, Xiaokang},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {15035-15044},
  url       = {https://mlanthology.org/iccv/2025/wu2025iccv-quantcache/}
}