Yi, Jonah

2 publications

NeurIPS 2024 KV Cache Is 1 Bit per Channel: Efficient Large Language Model Inference with Coupled Quantization Tianyi Zhang, Jonah Yi, Zhaozhuo Xu, Anshumali Shrivastava
NeurIPS 2024 NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-Add-Free Attention Tianyi Zhang, Jonah Yi, Bowen Yao, Zhaozhuo Xu, Anshumali Shrivastava