Yao, Bowen

1 publications

NeurIPS 2024 NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-Add-Free Attention Tianyi Zhang, Jonah Yi, Bowen Yao, Zhaozhuo Xu, Anshumali Shrivastava