LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models

Shi, Dachuan; Fu, Yonggan; Yuan, Xiangchi; Yu, Zhongzhi; You, Haoran; Li, Sixu; Dong, Xin; Kautz, Jan; Molchanov, Pavlo; Lin, Yingyan Celine

LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models

Dachuan Shi, Yonggan Fu, Xiangchi Yuan, Zhongzhi Yu, Haoran You, Sixu Li, Xin Dong, Jan Kautz, Pavlo Molchanov, Yingyan Celine Lin

ICML 2025 pp. 54892-54903

/icml/2025/shi2025icml-lacache/

Abstract

Recent advancements in Large Language Models (LLMs) have spurred interest in numerous applications requiring robust long-range capabilities, essential for processing extensive input contexts and continuously generating extended outputs. As sequence lengths increase, the number of Key-Value (KV) pairs in LLMs escalates, creating a significant efficiency bottleneck. In this paper, we propose a new KV cache optimization paradigm called LaCache, a training-free method for efficient and accurate generative inference of LLMs. LaCache enables LLMs to simultaneously address both of the critical challenges in long-range modeling: robust long-range capabilities and continuous generation without running out-of-memory (OOM). Specifically, LaCache integrates two key innovations: (1) a ladder-shaped KV cache pattern that stores KV pairs not only sequentially (left-to-right within each layer) but also across layers (from shallow to deep), providing an extended span for capturing long-range dependencies under a fixed storage budget, thereby boosting long-range capabilities; and (2) an iterative compaction mechanism that progressively compresses older caches, freeing up space for new tokens within a fixed cache size. This token distance-based dynamic compression enables more effective continuous generation under constrained cache budgets. Experiments across various tasks, benchmarks, and LLM models consistently validate LaCache’s effectiveness in enhancing LLMs’ long-range capabilities. Our code is available at https://github.com/GATECH-EIC/LaCache.

PDF ICML OpenReview Semantic Scholar

Cite

Text

Shi et al. "LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Shi et al. "LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/shi2025icml-lacache/)

BibTeX

@inproceedings{shi2025icml-lacache,
  title     = {{LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models}},
  author    = {Shi, Dachuan and Fu, Yonggan and Yuan, Xiangchi and Yu, Zhongzhi and You, Haoran and Li, Sixu and Dong, Xin and Kautz, Jan and Molchanov, Pavlo and Lin, Yingyan Celine},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {54892-54903},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/shi2025icml-lacache/}
}