Allen-Zhu, Zeyuan

39 publications

TMLR 2025 Physics of Language Models: Part 1, Learning Hierarchical Language Structures Zeyuan Allen-Zhu, Yuanzhi Li
ICLR 2025 Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process Tian Ye, Zicheng Xu, Yuanzhi Li, Zeyuan Allen-Zhu
ICLR 2025 Physics of Language Models: Part 2.2, How to Learn from Mistakes on Grade-School Math Problems Tian Ye, Zicheng Xu, Yuanzhi Li, Zeyuan Allen-Zhu
ICLR 2025 Physics of Language Models: Part 3.2, Knowledge Manipulation Zeyuan Allen-Zhu, Yuanzhi Li
ICLR 2025 Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws Zeyuan Allen-Zhu, Yuanzhi Li
NeurIPS 2025 Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers Zeyuan Allen-Zhu
ICML 2024 Physics of Language Models: Part 3.1, Knowledge Storage and Extraction Zeyuan Allen-Zhu, Yuanzhi Li
COLT 2023 Backward Feature Correction: How Deep Learning Performs Deep (Hierarchical) Learning Zeyuan Allen-Zhu, Yuanzhi Li
ICLR 2023 Forward Super-Resolution: How Can GANs Learn Hierarchical Generative Models for Real-World Distributions Zeyuan Allen-Zhu, Yuanzhi Li
NeurIPS 2023 SALSA VERDE: A Machine Learning Attack on LWE with Sparse Small Secrets Cathy Li, Emily Wenger, Zeyuan Allen-Zhu, Francois Charton, Kristin E. Lauter
ICLR 2023 Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning Zeyuan Allen-Zhu, Yuanzhi Li
ICLR 2022 LoRA: Low-Rank Adaptation of Large Language Models Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen
ICLR 2021 Byzantine-Resilient Non-Convex Stochastic Gradient Descent Zeyuan Allen-Zhu, Faeze Ebrahimianghazani, Jerry Li, Dan Alistarh
ICML 2019 A Convergence Theory for Deep Learning via Over-Parameterization Zeyuan Allen-Zhu, Yuanzhi Li, Zhao Song
NeurIPS 2019 Can SGD Learn Recurrent Neural Networks with Provable Generalization? Zeyuan Allen-Zhu, Yuanzhi Li
NeurIPS 2019 Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers Zeyuan Allen-Zhu, Yuanzhi Li, Yingyu Liang
NeurIPS 2019 On the Convergence Rate of Training Recurrent Neural Networks Zeyuan Allen-Zhu, Yuanzhi Li, Zhao Song
NeurIPS 2019 What Can ResNet Learn Efficiently, Going Beyond Kernels? Zeyuan Allen-Zhu, Yuanzhi Li
NeurIPS 2018 Byzantine Stochastic Gradient Descent Dan Alistarh, Zeyuan Allen-Zhu, Jerry Li
NeurIPS 2018 How to Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD Zeyuan Allen-Zhu
NeurIPS 2018 Is Q-Learning Provably Efficient? Chi Jin, Zeyuan Allen-Zhu, Sebastien Bubeck, Michael I Jordan
ICML 2018 Katyusha X: Simple Momentum Method for Stochastic Sum-of-Nonconvex Optimization Zeyuan Allen-Zhu
ICML 2018 Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits Zeyuan Allen-Zhu, Sebastien Bubeck, Yuanzhi Li
NeurIPS 2018 NEON2: Finding Local Minima via First-Order Oracles Zeyuan Allen-Zhu, Yuanzhi Li
NeurIPS 2018 Natasha 2: Faster Non-Convex Optimization than SGD Zeyuan Allen-Zhu
NeurIPS 2018 The Lingering of Gradients: How to Reuse Gradients over Time Zeyuan Allen-Zhu, David Simchi-Levi, Xinshang Wang
ICML 2017 Doubly Accelerated Methods for Faster CCA and Generalized Eigendecomposition Zeyuan Allen-Zhu, Yuanzhi Li
ICML 2017 Faster Principal Component Regression and Stable Matrix Chebyshev Approximation Zeyuan Allen-Zhu, Yuanzhi Li
ICML 2017 Follow the Compressed Leader: Faster Online Learning of Eigenvectors and Faster MMWU Zeyuan Allen-Zhu, Yuanzhi Li
NeurIPS 2017 Linear Convergence of a Frank-Wolfe Type Algorithm over Trace-Norm Balls Zeyuan Allen-Zhu, Elad Hazan, Wei Hu, Yuanzhi Li
ICML 2017 Natasha: Faster Non-Convex Stochastic Optimization via Strongly Non-Convex Parameter Zeyuan Allen-Zhu
ICML 2017 Near-Optimal Design of Experiments via Regret Minimization Zeyuan Allen-Zhu, Yuanzhi Li, Aarti Singh, Yining Wang
ICML 2016 Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling Zeyuan Allen-Zhu, Zheng Qu, Peter Richtarik, Yang Yuan
NeurIPS 2016 Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters Zeyuan Allen-Zhu, Yang Yuan, Karthik Sridharan
ICML 2016 Improved SVRG for Non-Strongly-Convex or Sum-of-Non-Convex Objectives Zeyuan Allen-Zhu, Yang Yuan
NeurIPS 2016 LazySVD: Even Faster SVD Decomposition yet Without Agonizing Pain Zeyuan Allen-Zhu, Yuanzhi Li
NeurIPS 2016 Optimal Black-Box Reductions Between Optimization Objectives Zeyuan Allen-Zhu, Elad Hazan
ICML 2016 Variance Reduction for Faster Non-Convex Optimization Zeyuan Allen-Zhu, Elad Hazan
ICML 2013 A Local Algorithm for Finding Well-Connected Clusters Zeyuan Allen Zhu, Silvio Lattanzi, Vahab Mirrokni