Kasikci, Baris

4 publications

ICLR 2026 Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs Kan Zhu, Tian Tang, Qinyu Xu, Zhan Jin, Yile Gu, Zhichen Zeng, Rohan Kadekodi, Liangyu Zhao, Ang Li, Arvind Krishnamurthy, Baris Kasikci
ICLR 2025 Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models Keisuke Kamahori, Tian Tang, Yile Gu, Kan Zhu, Baris Kasikci
ICLRW 2024 Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models Keisuke Kamahori, Yile Gu, Kan Zhu, Baris Kasikci
ICML 2024 QUEST: Query-Aware Sparsity for Efficient Long-Context LLM Inference Jiaming Tang, Yilong Zhao, Kan Zhu, Guangxuan Xiao, Baris Kasikci, Song Han