ML Anthology
Authors
Search
About
Zhu, Kan
3 publications
ICLR
2025
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
Keisuke Kamahori
,
Tian Tang
,
Yile Gu
,
Kan Zhu
,
Baris Kasikci
ICLRW
2024
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
Keisuke Kamahori
,
Yile Gu
,
Kan Zhu
,
Baris Kasikci
ICML
2024
QUEST: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Jiaming Tang
,
Yilong Zhao
,
Kan Zhu
,
Guangxuan Xiao
,
Baris Kasikci
,
Song Han