Dong, Harry

4 publications

ICML 2025 ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference Hanshi Sun, Li-Wen Chang, Wenlei Bao, Size Zheng, Ningxin Zheng, Xin Liu, Harry Dong, Yuejie Chi, Beidi Chen

ICML 2024 Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference Harry Dong, Xinyu Yang, Zhenyu Zhang, Zhangyang Wang, Yuejie Chi, Beidi Chen

ICMLW 2024 Prompt-Prompted Adaptive Structured Pruning for Efficient LLM Generation Harry Dong, Beidi Chen, Yuejie Chi

ICMLW 2023 Towards Structured Sparsity in Transformers for Efficient Inference Harry Dong, Beidi Chen, Yuejie Chi