May, Avner

10 publications

ICML 2025 Cost-Efficient Collaboration Between On-Device and Cloud Language Models Avanika Narayan, Dan Biderman, Sabri Eyuboglu, Avner May, Scott Linderman, James Zou, Christopher Re

ICLRW 2025 Cost-Efficient Collaboration Between On-Device and Cloud Language Models Avanika Narayan, Sabri Eyuboglu, Dan Biderman, Avner May, Scott Linderman, James Zou, Christopher Re

ICLR 2025 MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding Ranajoy Sadhukhan, Jian Chen, Zhuoming Chen, Vashisth Tiwari, Ruihang Lai, Jinyuan Shi, Ian En-Hsu Yen, Avner May, Tianqi Chen, Beidi Chen

NeurIPS 2024 Sequoia: Scalable and Robust Speculative Decoding Zhuoming Chen, Avner May, Ruslan Svirschevski, Yuhsun Huang, Max Ryabinin, Zhihao Jia, Beidi Chen

NeurIPS 2024 SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices Ruslan Svirschevski, Avner May, Zhuoming Chen, Beidi Chen, Zhihao Jia, Max Ryabinin

NeurIPS 2024 The Mamba in the Llama: Distilling and Accelerating Hybrid Models Junxiong Wang, Daniele Paliotta, Avner May, Alexander M. Rush, Tri Dao

ICMLW 2024 The Mamba in the Llama: Distilling and Accelerating Hybrid Models Junxiong Wang, Daniele Paliotta, Avner May, Alexander M Rush, Tri Dao

JMLR 2019 Kernel Approximation Methods for Speech Recognition Avner May, Alireza Bagheri Garakani, Zhiyun Lu, Dong Guo, Kuan Liu, Aurélien Bellet, Linxi Fan, Michael Collins, Daniel Hsu, Brian Kingsbury, Michael Picheny, Fei Sha

AISTATS 2019 Low-Precision Random Fourier Features for Memory-Constrained Kernel Approximation Jian Zhang, Avner May, Tri Dao, Christopher Re

NeurIPS 2019 On the Downstream Performance of Compressed Word Embeddings Avner May, Jian Zhang, Tri Dao, Christopher Ré