Wei, Xuechao

1 publications

NeurIPS 2024 ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction Renze Chen, Zhuofeng Wang, Beiquan Cao, Tong Wu, Size Zheng, Xiuhong Li, Xuechao Wei, Shengen Yan, Meng Li, Yun Liang