Yu, Xianzhi
8 publications
ICLR
2026
Scaling up, Speeding up: A Benchmark of Speculative Decoding for Efficient LLM Test-Time Scaling
Shengyin Sun, Yiming Li, Xing Li, Yingzhao Lian, Weizhe Lin, Hui-Ling Zhen, Zhiyuan Yang, Xianzhi Yu, Chen Chen, Mingxuan Yuan, Chen Ma NeurIPS
2025
AttentionPredictor: Temporal Patterns Matter for KV Cache Compression
Qingyue Yang, Jie Wang, Xing Li, Zhihai Wang, Chen Chen, Lei Chen, Xianzhi Yu, Wulong Liu, Jianye Hao, Mingxuan Yuan, Bin Li ICML
2025
FlatQuant: Flatness Matters for LLM Quantization
Yuxuan Sun, Ruikang Liu, Haoli Bai, Han Bao, Kang Zhao, Yuening Li, Jiaxin Hu, Xianzhi Yu, Lu Hou, Chun Yuan, Xin Jiang, Wulong Liu, Jun Yao