Klimovic, Ana

3 publications

ICLRW 2025 DeltaMoE: Memory-Efficient Inference for Merged Mixture of Experts with Delta Compression Boyko Borisov, Xiaozhe Yao, Nezihe Merve Gürel, Ana Klimovic
ICML 2025 Demystifying Cost-Efficiency in LLM Serving over Heterogeneous GPUs Youhe Jiang, Fangcheng Fu, Xiaozhe Yao, Guoliang He, Xupeng Miao, Ana Klimovic, Bin Cui, Binhang Yuan, Eiko Yoneki
ICML 2024 DéjàVu: KV-Cache Streaming for Fast, Fault-Tolerant Generative LLM Serving Foteini Strati, Sara Mcallister, Amar Phanishayee, Jakub Tarnawski, Ana Klimovic