Chen, Beidi

65 publications

ICLR 2025 APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding Xinyu Yang, Tianqi Chen, Beidi Chen
NeurIPS 2025 Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts Haizhong Zheng, Yang Zhou, Brian R. Bartoldson, Bhavya Kailkhura, Fan Lai, Jiawei Zhao, Beidi Chen
ICML 2025 GSM-$∞$: How Do Your LLMs Behave over Infinitely Increasing Reasoning Complexity and Context Length? Yang Zhou, Hongyi Liu, Zhuoming Chen, Yuandong Tian, Beidi Chen
NeurIPS 2025 Kinetics: Rethinking Test-Time Scaling Law Ranajoy Sadhukhan, Zhuoming Chen, Haizhong Zheng, Beidi Chen
ICLR 2025 MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding Ranajoy Sadhukhan, Jian Chen, Zhuoming Chen, Vashisth Tiwari, Ruihang Lai, Jinyuan Shi, Ian En-Hsu Yen, Avner May, Tianqi Chen, Beidi Chen
ICLR 2025 MagicPIG: LSH Sampling for Efficient LLM Generation Zhuoming Chen, Ranajoy Sadhukhan, Zihao Ye, Yang Zhou, Jianyu Zhang, Niklas Nolte, Yuandong Tian, Matthijs Douze, Leon Bottou, Zhihao Jia, Beidi Chen
ICLR 2025 Memory Mosaics Jianyu Zhang, Niklas Nolte, Ranajoy Sadhukhan, Beidi Chen, Leon Bottou
NeurIPS 2025 Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation Xinyu Yang, Yuwei An, Hongyi Liu, Tianqi Chen, Beidi Chen
TMLR 2025 Reliable and Responsible Foundation Models Xinyu Yang, Junlin Han, Rishi Bommasani, Jinqi Luo, Wenjie Qu, Wangchunshu Zhou, Adel Bibi, Xiyao Wang, Jaehong Yoon, Elias Stengel-Eskin, Shengbang Tong, Lingfeng Shen, Rafael Rafailov, Runjia Li, Zhaoyang Wang, Yiyang Zhou, Chenhang Cui, Yu Wang, Wenhao Zheng, Huichi Zhou, Jindong Gu, Zhaorun Chen, Peng Xia, Tony Lee, Thomas P Zollo, Vikash Sehwag, Jixuan Leng, Jiuhai Chen, Yuxin Wen, Huan Zhang, Zhun Deng, Linjun Zhang, Pavel Izmailov, Pang Wei Koh, Yulia Tsvetkov, Andrew Gordon Wilson, Jiaheng Zhang, James Zou, Cihang Xie, Hao Wang, Philip Torr, Julian McAuley, David Alvarez-Melis, Florian Tramèr, Kaidi Xu, Suman Jana, Chris Callison-Burch, Rene Vidal, Filippos Kokkinos, Mohit Bansal, Beidi Chen, Huaxiu Yao
ICML 2025 ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference Hanshi Sun, Li-Wen Chang, Wenlei Bao, Size Zheng, Ningxin Zheng, Xin Liu, Harry Dong, Yuejie Chi, Beidi Chen
ICML 2025 Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation Jingyu Liu, Beidi Chen, Ce Zhang
ICLR 2025 Zeroth-Order Fine-Tuning of LLMs with Transferable Static Sparsity Wentao Guo, Jikai Long, Yimeng Zeng, Zirui Liu, Xinyu Yang, Yide Ran, Jacob R. Gardner, Osbert Bastani, Christopher De Sa, Xiaodong Yu, Beidi Chen, Zhaozhuo Xu
NeurIPS 2024 $\texttt{Model-GLUE}$: Democratized LLM Scaling for a Large Model Zoo in the Wild Xinyu Zhao, Guoheng Sun, Ruisi Cai, Yukun Zhou, Pingzhi Li, Peihao Wang, Bowen Tan, Yexiao He, Li Chen, Yi Liang, Beidi Chen, Binhang Yuan, Hongyi Wang, Ang Li, Zhangyang Wang, Tianlong Chen
NeurIPSW 2024 APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding Xinyu Yang, Tianqi Chen, Beidi Chen
ICLR 2024 Efficient Streaming Language Models with Attention Sinks Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, Mike Lewis
NeurIPS 2024 Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding Zhenyu Zhang, Runjin Chen, Shiwei Liu, Zhewei Yao, Olatunji Ruwase, Beidi Chen, Xiaoxia Wu, Zhangyang Wang
ICML 2024 GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Jiawei Zhao, Zhenyu Zhang, Beidi Chen, Zhangyang Wang, Anima Anandkumar, Yuandong Tian
ICLRW 2024 GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Jiawei Zhao, Zhenyu Zhang, Beidi Chen, Zhangyang Wang, Anima Anandkumar, Yuandong Tian
ICML 2024 Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference Harry Dong, Xinyu Yang, Zhenyu Zhang, Zhangyang Wang, Yuejie Chi, Beidi Chen
ICML 2024 HexGen: Generative Inference of Large Language Model over Heterogeneous Environment Youhe Jiang, Ran Yan, Xiaozhe Yao, Yang Zhou, Beidi Chen, Binhang Yuan
ICMLW 2024 It Takes Two: On the Seamlessness Between Reward and Policy Model in RLHF TaiMing Lu, Lingfeng Shen, Xinyu Yang, Weiting Tan, Beidi Chen, Huaxiu Yao
ICLR 2024 JoMA: Demystifying Multilayer Transformers via Joint Dynamics of MLP and Attention Yuandong Tian, Yiping Wang, Zhenyu Zhang, Beidi Chen, Simon Shaolei Du
ICML 2024 KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache Zirui Liu, Jiayi Yuan, Hongye Jin, Shaochen Zhong, Zhaozhuo Xu, Vladimir Braverman, Beidi Chen, Xia Hu
NeurIPS 2024 Learn to Be Efficient: Build Structured Sparsity in Large Language Models Haizhong Zheng, Xiaoyan Bai, Xueshen Liu, Z. Morley Mao, Beidi Chen, Fan Lai, Atul Prakash
ICML 2024 LoCoCo: Dropping in Convolutions for Long Context Compression Ruisi Cai, Yuandong Tian, Zhangyang Wang, Beidi Chen
ICMLW 2024 MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training Cheng Luo, Jiawei Zhao, Zhuoming Chen, Beidi Chen, Anima Anandkumar
NeurIPSW 2024 MagicPIG: LSH Sampling for Efficient LLM Generation Zhuoming Chen, Ranajoy Sadhukhan, Zihao Ye, Yang Zhou, Jianyu Zhang, Niklas Nolte, Yuandong Tian, Matthijs Douze, Leon Bottou, Zhihao Jia, Beidi Chen
NeurIPS 2024 Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length Xuezhe Ma, Xiaomeng Yang, Wenhan Xiong, Beidi Chen, Lili Yu, Hao Zhang, Jonathan May, Luke Zettlemoyer, Omer Levy, Chunting Zhou
ICLRW 2024 Memorization and Privacy Risks in Domain-Specific Large Language Models Xinyu Yang, Zichen Wen, Wenjie Qu, Zhaorun Chen, Zhiying Xiang, Beidi Chen, Huaxiu Yao
NeurIPS 2024 Mini-Sequence Transformers: Optimizing Intermediate Memory for Long Sequences Training Cheng Luo, Jiawei Zhao, Zhuoming Chen, Beidi Chen, Anima Anandkumar
NeurIPS 2024 Nearest Neighbor Speculative Decoding for LLM Generation and Attribution Minghan Li, Xilun Chen, Ari Holtzman, Beidi Chen, Jimmy Lin, Wen-tau Yih, Xi Victoria Lin
NeurIPS 2024 On the Surprising Effectiveness of Attention Transfer for Vision Transformers Alexander C. Li, Yuandong Tian, Beidi Chen, Deepak Pathak, Xinlei Chen
ICMLW 2024 Prompt-Prompted Adaptive Structured Pruning for Efficient LLM Generation Harry Dong, Beidi Chen, Yuejie Chi
NeurIPS 2024 S$^{2}$FT: Efficient, Scalable and Generalizable LLM Fine-Tuning by Structured Sparsity Xinyu Yang, Jixuan Leng, Geyang Guo, Jiawei Zhao, Ryumei Nakada, Linjun Zhang, Huaxiu Yao, Beidi Chen
NeurIPS 2024 SIRIUS : Contexual Sparisty with Correction for Efficient LLMs Yang Zhou, Zhuoming Chen, Zhaozhuo Xu, Xi Victoria Lin, Beidi Chen
NeurIPS 2024 Sequoia: Scalable and Robust Speculative Decoding Zhuoming Chen, Avner May, Ruslan Svirschevski, Yuhsun Huang, Max Ryabinin, Zhihao Jia, Beidi Chen
NeurIPSW 2024 Sirius: Contextual Sparsity with Correction for Efficient LLM Yang Zhou, Zhuoming Chen, Zhaozhuo Xu, Xi Victoria Lin, Beidi Chen
ICML 2024 Soft Prompt Recovers Compressed LLMs, Transferably Zhaozhuo Xu, Zirui Liu, Beidi Chen, Shaochen Zhong, Yuxin Tang, Jue Wang, Kaixiong Zhou, Xia Hu, Anshumali Shrivastava
NeurIPS 2024 SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices Ruslan Svirschevski, Avner May, Zhuoming Chen, Beidi Chen, Zhihao Jia, Max Ryabinin
ICMLW 2024 Zeroth-Order Fine-Tuning of LLMs with Extreme Sparsity Wentao Guo, Jikai Long, Yimeng Zeng, Zirui Liu, Xinyu Yang, Yide Ran, Jacob R. Gardner, Osbert Bastani, Christopher De Sa, Xiaodong Yu, Beidi Chen, Zhaozhuo Xu
ICMLW 2024 Zeroth-Order Fine-Tuning of LLMs with Extreme Sparsity Wentao Guo, Jikai Long, Yimeng Zeng, Zirui Liu, Xinyu Yang, Yide Ran, Jacob R. Gardner, Osbert Bastani, Christopher De Sa, Xiaodong Yu, Beidi Chen, Zhaozhuo Xu
ICML 2023 CocktailSGD: Fine-Tuning Foundation Models over 500Mbps Networks Jue Wang, Yucheng Lu, Binhang Yuan, Beidi Chen, Percy Liang, Christopher De Sa, Christopher Re, Ce Zhang
ICML 2023 Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time Zichang Liu, Jue Wang, Tri Dao, Tianyi Zhou, Binhang Yuan, Zhao Song, Anshumali Shrivastava, Ce Zhang, Yuandong Tian, Christopher Re, Beidi Chen
COLT 2023 Fast Algorithms for a New Relaxation of Optimal Transport Moses Charikar, Beidi Chen, Christopher Ré, Erik Waingarten
ICML 2023 FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Beidi Chen, Percy Liang, Christopher Re, Ion Stoica, Ce Zhang
NeurIPS 2023 H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models Zhenyu Zhang, Ying Sheng, Tianyi Zhou, Tianlong Chen, Lianmin Zheng, Ruisi Cai, Zhao Song, Yuandong Tian, Christopher Ré, Clark Barrett, Zhangyang "Atlas" Wang, Beidi Chen
ICMLW 2023 H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models Zhenyu Zhang, Ying Sheng, Tianyi Zhou, Tianlong Chen, Lianmin Zheng, Ruisi Cai, Zhao Song, Yuandong Tian, Christopher Re, Clark Barrett, Zhangyang Wang, Beidi Chen
ICMLW 2023 Incremental Low-Rank Learning Jiawei Zhao, Yifei Zhang, Beidi Chen, Florian Tobias Schaefer, Anima Anandkumar
NeurIPSW 2023 JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention Yuandong Tian, Yiping Wang, Zhenyu Zhang, Beidi Chen, Simon Du
NeurIPS 2023 Laughing Hyena Distillery: Extracting Compact Recurrences from Convolutions Stefano Massaroli, Michael Poli, Dan Fu, Hermann Kumbong, Rom Parnichkun, David Romero, Aman Timalsina, Quinn McIntyre, Beidi Chen, Atri Rudra, Ce Zhang, Christopher Ré, Stefano Ermon, Yoshua Bengio
NeurIPS 2023 Scan and Snap: Understanding Training Dynamics and Token Composition in 1-Layer Transformer Yuandong Tian, Yiping Wang, Beidi Chen, Simon S Du
ICMLW 2023 Towards Structured Sparsity in Transformers for Efficient Inference Harry Dong, Beidi Chen, Yuejie Chi
NeurIPS 2022 Decentralized Training of Foundation Models in Heterogeneous Environments Binhang Yuan, Yongjun He, Jared Davis, Tianyi Zhang, Tri Dao, Beidi Chen, Percy Liang, Christopher Ré, Ce Zhang
NeurIPS 2022 Fine-Tuning Language Models over Slow Networks Using Activation Quantization with Guarantees Jue Wang, Binhang Yuan, Luka Rimanic, Yongjun He, Tri Dao, Beidi Chen, Christopher Ré, Ce Zhang
ICML 2022 Monarch: Expressive Structured Matrices for Efficient and Accurate Training Tri Dao, Beidi Chen, Nimit S Sohoni, Arjun Desai, Michael Poli, Jessica Grogan, Alexander Liu, Aniruddh Rao, Atri Rudra, Christopher Re
ICLR 2022 Pixelated Butterfly: Simple and Efficient Sparse Training for Neural Network Models Beidi Chen, Tri Dao, Kaizhao Liang, Jiaming Yang, Zhao Song, Atri Rudra, Christopher Re
ICML 2021 A Tale of Two Efficient and Informative Negative Sampling Distributions Shabnam Daghaghi, Tharun Medini, Nicholas Meisburger, Beidi Chen, Mengnan Zhao, Anshumali Shrivastava
NeurIPS 2021 Locality Sensitive Teaching Zhaozhuo Xu, Beidi Chen, Chaojian Li, Weiyang Liu, Le Song, Yingyan Lin, Anshumali Shrivastava
ICLR 2021 MONGOOSE: A Learnable LSH Framework for Efficient Neural Network Training Beidi Chen, Zichang Liu, Binghui Peng, Zhaozhuo Xu, Jonathan Lingjie Li, Tri Dao, Zhao Song, Anshumali Shrivastava, Christopher Re
ICLR 2021 SOLAR: Sparse Orthogonal Learned and Random Embeddings Tharun Medini, Beidi Chen, Anshumali Shrivastava
NeurIPS 2021 Scatterbrain: Unifying Sparse and Low-Rank Attention Beidi Chen, Tri Dao, Eric Winsor, Zhao Song, Atri Rudra, Christopher Ré
ICML 2020 Angular Visual Hardness Beidi Chen, Weiyang Liu, Zhiding Yu, Jan Kautz, Anshumali Shrivastava, Animesh Garg, Animashree Anandkumar
ICMLW 2019 Angular Visual Hardness Beidi Chen, Weiyang Liu, Animesh Garg, Zhiding Yu, Anshumali Shrivastava, Anima Anandkumar
NeurIPS 2019 Fast and Accurate Stochastic Gradient Estimation Beidi Chen, Yingchen Xu, Anshumali Shrivastava
UAI 2018 Densified Winner Take All (WTA) Hashing for Sparse Datasets Beidi Chen, Anshumali Shrivastava