KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction

Jang-Hyun Kim, Jinuk Kim, Sangwoo Kwon, Jae W. Lee, Sangdoo Yun, Hyun Oh Song

NeurIPS 2025

/neurips/2025/kim2025neurips-kvzip/

Abstract

Transformer-based large language models (LLMs) cache context as key-value (KV) pairs during inference. As context length grows, KV cache sizes expand, leading to substantial memory overhead and increased attention latency. This paper introduces \textit{KVzip}, a query-agnostic KV cache eviction method enabling effective reuse of compressed KV caches across diverse queries. KVzip quantifies the importance of a KV pair using the underlying LLM to reconstruct original contexts from cached KV pairs, subsequently evicting pairs with lower importance. Extensive empirical evaluations demonstrate that KVzip reduces KV cache size by $3$-$4\times$ and FlashAttention decoding latency by approximately $2\times$, with negligible performance loss in question-answering, retrieval, reasoning, and code comprehension tasks. Evaluations include various models such as LLaMA3.1, Qwen2.5, and Gemma3, with context lengths reaching up to 170K tokens. KVzip significantly outperforms existing query-aware KV eviction methods, which suffer from performance degradation even at a 90\% cache budget ratio under multi-query scenarios.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Kim et al. "KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction." Advances in Neural Information Processing Systems, 2025.

Markdown

[Kim et al. "KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/kim2025neurips-kvzip/)

BibTeX

@inproceedings{kim2025neurips-kvzip,
  title     = {{KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction}},
  author    = {Kim, Jang-Hyun and Kim, Jinuk and Kwon, Sangwoo and Lee, Jae W. and Yun, Sangdoo and Song, Hyun Oh},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/kim2025neurips-kvzip/}
}