May, Avner

10 publications

ICML 2025 Cost-Efficient Collaboration Between On-Device and Cloud Language Models Avanika Narayan, Dan Biderman, Sabri Eyuboglu, Avner May, Scott Linderman, James Zou, Christopher Re
ICLRW 2025 Cost-Efficient Collaboration Between On-Device and Cloud Language Models Avanika Narayan, Sabri Eyuboglu, Dan Biderman, Avner May, Scott Linderman, James Zou, Christopher Re
ICLR 2025 MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding Ranajoy Sadhukhan, Jian Chen, Zhuoming Chen, Vashisth Tiwari, Ruihang Lai, Jinyuan Shi, Ian En-Hsu Yen, Avner May, Tianqi Chen, Beidi Chen
NeurIPS 2024 Sequoia: Scalable and Robust Speculative Decoding Zhuoming Chen, Avner May, Ruslan Svirschevski, Yuhsun Huang, Max Ryabinin, Zhihao Jia, Beidi Chen
NeurIPS 2024 SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices Ruslan Svirschevski, Avner May, Zhuoming Chen, Beidi Chen, Zhihao Jia, Max Ryabinin
NeurIPS 2024 The Mamba in the Llama: Distilling and Accelerating Hybrid Models Junxiong Wang, Daniele Paliotta, Avner May, Alexander M. Rush, Tri Dao
ICMLW 2024 The Mamba in the Llama: Distilling and Accelerating Hybrid Models Junxiong Wang, Daniele Paliotta, Avner May, Alexander M Rush, Tri Dao
JMLR 2019 Kernel Approximation Methods for Speech Recognition Avner May, Alireza Bagheri Garakani, Zhiyun Lu, Dong Guo, Kuan Liu, Aurélien Bellet, Linxi Fan, Michael Collins, Daniel Hsu, Brian Kingsbury, Michael Picheny, Fei Sha
AISTATS 2019 Low-Precision Random Fourier Features for Memory-Constrained Kernel Approximation Jian Zhang, Avner May, Tri Dao, Christopher Re
NeurIPS 2019 On the Downstream Performance of Compressed Word Embeddings Avner May, Jian Zhang, Tri Dao, Christopher Ré