dKV-Cache: The Cache for Diffusion Language Models

Xinyin Ma, Runpeng Yu, Gongfan Fang, Xinchao Wang

NeurIPS 2025

/neurips/2025/ma2025neurips-dkvcache/

Abstract

Diffusion Language Models (DLMs) have been seen as a promising competitor for autoregressive language models (ARs). However, diffusion language models have long been constrained by slow inference. A core challenge is that their non‑autoregressive architecture and bidirectional attention preclude the key–value cache that accelerates decoding. We address this bottleneck by proposing a KV-cache-like mechanism, **d**elayed **KV-Cache**, for the denoising process of DLMs. Our approach is motivated by the observation that different tokens have distinct representation dynamics throughout the diffusion process. Accordingly, we propose a delayed and conditioned caching strategy for key and value states. We design two complementary variants to cache key and value step‑by‑step: (1) dKV-Cache-Decode, which provides almost lossless acceleration, and even improves performance on long sequences, suggesting that existing DLMs may under‑utilise contextual information during inference. (2) dKV-Cache‑Greedy, which has aggressive caching with reduced lifespan, achieving higher speed-ups with quadratic time complexity at the cost of some performance degradation. dKV-Cache, in final, achieves from 2-10$\times$ speedup in inference, largely narrowing the gap between ARs and DLMs. We evaluate our dKV-Cache on several benchmarks, delivering acceleration across general language understanding, mathematical, and code‑generation benchmarks. Experiments demonstrate that cache can also be used in DLMs, even in a training-free manner from current DLMs.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Ma et al. "dKV-Cache: The Cache for Diffusion Language Models." Advances in Neural Information Processing Systems, 2025.

Markdown

[Ma et al. "dKV-Cache: The Cache for Diffusion Language Models." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/ma2025neurips-dkvcache/)

BibTeX

@inproceedings{ma2025neurips-dkvcache,
  title     = {{dKV-Cache: The Cache for Diffusion Language Models}},
  author    = {Ma, Xinyin and Yu, Runpeng and Fang, Gongfan and Wang, Xinchao},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/ma2025neurips-dkvcache/}
}