Huang, Kezhao

1 publications

ICMLW 2024 FastDecode: High-Throughput LLM Serving Through Disaggregating Attention Computation Jiaao He, Kezhao Huang, Jidong Zhai