Zhu, Kan

3 publications

ICLR 2025 Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models Keisuke Kamahori, Tian Tang, Yile Gu, Kan Zhu, Baris Kasikci
ICLRW 2024 Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models Keisuke Kamahori, Yile Gu, Kan Zhu, Baris Kasikci
ICML 2024 QUEST: Query-Aware Sparsity for Efficient Long-Context LLM Inference Jiaming Tang, Yilong Zhao, Kan Zhu, Guangxuan Xiao, Baris Kasikci, Song Han