Lee, Mingu
8 publications
NeurIPS
2025
KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments
ICLRW
2024
Recursive Speculative Decoding: Accelerating LLM Inference via Sampling Without Replacement