Hosmer, Basil

1 publications

ICML 2024 CHAI: Clustered Head Attention for Efficient LLM Inference Saurabh Agarwal, Bilge Acun, Basil Hosmer, Mostafa Elhoushi, Yejin Lee, Shivaram Venkataraman, Dimitris Papailiopoulos, Carole-Jean Wu