Yao, Zhewei

23 publications

ICLRW 2025 ReFoRCE: A Text-to-SQL Agent with Self-Refinement, Format Restriction, and Column Exploration Minghang Deng, Ashwin Ramachandran, Canwen Xu, Lanxiang Hu, Zhewei Yao, Anupam Datta, Hao Zhang
AAAI 2024 DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing Conglong Li, Zhewei Yao, Xiaoxia Wu, Minjia Zhang, Connor Holmes, Cheng Li, Yuxiong He
AAAI 2024 Exploring Post-Training Quantization in LLMs from Comprehensive Study to Low Rank Compensation Zhewei Yao, Xiaoxia Wu, Cheng Li, Stephen Youn, Yuxiong He
NeurIPS 2024 Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding Zhenyu Zhang, Runjin Chen, Shiwei Liu, Zhewei Yao, Olatunji Ruwase, Beidi Chen, Xiaoxia Wu, Zhangyang Wang
ICLR 2024 ZeRO++: Extremely Efficient Collective Communication for Large Model Training Guanhua Wang, Heyang Qin, Sam Ade Jacobs, Xiaoxia Wu, Connor Holmes, Zhewei Yao, Samyam Rajbhandari, Olatunji Ruwase, Feng Yan, Lei Yang, Yuxiong He
ICLR 2023 DySR: Adaptive Super-Resolution via Algorithm and System Co-Design Syed Zawad, Cheng Li, Zhewei Yao, Elton Zheng, Yuxiong He, Feng Yan
ICML 2023 Understanding Int4 Quantization for Language Models: Latency Speedup, Composability, and Failure Cases Xiaoxia Wu, Cheng Li, Reza Yazdani Aminabadi, Zhewei Yao, Yuxiong He
ICML 2022 DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale Samyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He
WACV 2022 Hessian-Aware Pruning and Optimal Neural Implant Shixing Yu, Zhewei Yao, Amir Gholami, Zhen Dong, Sehoon Kim, Michael W. Mahoney, Kurt Keutzer
ICLR 2022 How Much Can CLIP Benefit Vision-and-Language Tasks? Sheng Shen, Liunian Harold Li, Hao Tan, Mohit Bansal, Anna Rohrbach, Kai-Wei Chang, Zhewei Yao, Kurt Keutzer
NeurIPS 2022 XTC: Extreme Compression for Pre-Trained Transformers Made Simple and Efficient Xiaoxia Wu, Zhewei Yao, Minjia Zhang, Conglong Li, Yuxiong He
NeurIPS 2022 ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers Zhewei Yao, Reza Yazdani Aminabadi, Minjia Zhang, Xiaoxia Wu, Conglong Li, Yuxiong He
AAAI 2021 ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning Zhewei Yao, Amir Gholami, Sheng Shen, Mustafa Mustafa, Kurt Keutzer, Michael W. Mahoney
ICML 2021 ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training Jianfei Chen, Lianmin Zheng, Zhewei Yao, Dequan Wang, Ion Stoica, Michael Mahoney, Joseph Gonzalez
ICML 2021 HAWQ-V3: Dyadic Neural Network Quantization Zhewei Yao, Zhen Dong, Zhangcheng Zheng, Amir Gholami, Jiali Yu, Eric Tan, Leyuan Wang, Qijing Huang, Yida Wang, Michael Mahoney, Kurt Keutzer
ICML 2021 I-BERT: Integer-Only BERT Quantization Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer
NeurIPS 2020 A Statistical Framework for Low-Bitwidth Training of Deep Neural Networks Jianfei Chen, Yu Gai, Zhewei Yao, Michael W. Mahoney, Joseph E Gonzalez
NeurIPS 2020 HAWQ-V2: Hessian Aware Trace-Weighted Quantization of Neural Networks Zhen Dong, Zhewei Yao, Daiyaan Arfeen, Amir Gholami, Michael W. Mahoney, Kurt Keutzer
AAAI 2020 Inefficiency of K-FAC for Large Batch Size Training Linjian Ma, Gabe Montague, Jiayu Ye, Zhewei Yao, Amir Gholami, Kurt Keutzer, Michael W. Mahoney
ICML 2020 PowerNorm: Rethinking Batch Normalization in Transformers Sheng Shen, Zhewei Yao, Amir Gholami, Michael Mahoney, Kurt Keutzer
AAAI 2020 Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT Sheng Shen, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W. Mahoney, Kurt Keutzer
NeurIPS 2019 ANODEV2: A Coupled Neural ODE Framework Tianjun Zhang, Zhewei Yao, Amir Gholami, Joseph E Gonzalez, Kurt Keutzer, Michael W. Mahoney, George Biros
NeurIPS 2018 Hessian-Based Analysis of Large Batch Training and Robustness to Adversaries Zhewei Yao, Amir Gholami, Qi Lei, Kurt Keutzer, Michael W. Mahoney