Leng, Jingwen

7 publications

ICML 2025 An Efficient Private GPT Never Autoregressively Decodes Zhengyi Li, Yue Guan, Kang Yang, Yu Feng, Ning Liu, Yu Yu, Jingwen Leng, Minyi Guo
NeurIPS 2025 ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive Xinhao Luo, Zihan Liu, Yangjie Zhou, Shihan Fang, Ziyu Huang, Yu Feng, Chen Zhang, Shixuan Sun, Zhenzhe Zheng, Jingwen Leng, Minyi Guo
IJCAI 2025 TreeKV: Smooth Key-Value Cache Compression with Tree Structures Ziwei He, Jian Yuan, Haoli Bai, Jingwen Leng, Bo Jiang
NeurIPS 2025 Yggdrasil: Bridging Dynamic Speculation and Static Runtime for Latency-Optimal Tree-Based LLM Decoding Yue Guan, Changming Yu, Shihan Fang, Weiming Hu, Zaifeng Pan, Zheng Wang, Zihan Liu, Yangjie Zhou, Yufei Ding, Minyi Guo, Jingwen Leng
NeurIPS 2024 Nimbus: Secure and Efficient Two-Party Inference for Transformers Zhengyi Li, Kang Yang, Jin Tan, Wen-jie Lu, Haoqi Wu, Xiao Wang, Yu Yu, Derun Zhao, Yancheng Zheng, Minyi Guo, Jingwen Leng
AAAI 2022 Block-Skim: Efficient Question Answering for Transformer Yue Guan, Zhengyi Li, Zhouhan Lin, Yuhao Zhu, Jingwen Leng, Minyi Guo
ICLR 2022 SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian Approximation Cong Guo, Yuxian Qiu, Jingwen Leng, Xiaotian Gao, Chen Zhang, Yunxin Liu, Fan Yang, Yuhao Zhu, Minyi Guo