Lyu, Kaifeng

32 publications

ICLR 2025 A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules Kairong Luo, Haodong Wen, Shengding Hu, Zhenbo Sun, Zhiyuan Liu, Maosong Sun, Kaifeng Lyu, Wenguang Chen
NeurIPS 2025 Adam Reduces a Unique Form of Sharpness: Theoretical Insights near the Minimizer Manifold Xinghan Li, Haodong Wen, Kaifeng Lyu
NeurIPS 2025 Data Mixing Can Induce Phase Transitions in Knowledge Acquisition Xinran Gu, Kaifeng Lyu, Jiazheng Li, Jingzhao Zhang
ICLRW 2025 Data Mixing Can Induce Phase Transitions in Knowledge Acquisition Xinran Gu, Kaifeng Lyu, Jiazheng Li, Jingzhao Zhang
ICLR 2025 Efficient Stagewise Pretraining via Progressive Subnetworks Abhishek Panigrahi, Nikunj Saunshi, Kaifeng Lyu, Sobhan Miryoosefi, Sashank J. Reddi, Satyen Kale, Sanjiv Kumar
ICLR 2025 Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks Binghui Li, Zhixuan Pan, Kaifeng Lyu, Jian Li
NeurIPS 2025 How Far Are We from Optimal Reasoning Efficiency? Jiaxuan Gao, Shu Yan, Qixin Tan, Lu Yang, Shusheng Xu, Wei Fu, Zhiyu Mei, Kaifeng Lyu, Yi Wu
ICLR 2025 RNNs Are Not Transformers (Yet): The Key Bottleneck on In-Context Retrieval Kaiyue Wen, Xingyu Dang, Kaifeng Lyu
ICLR 2025 Safety Alignment Should Be Made More than Just a Few Tokens Deep Xiangyu Qi, Ashwinee Panda, Kaifeng Lyu, Xiao Ma, Subhrajit Roy, Ahmad Beirami, Prateek Mittal, Peter Henderson
ICLR 2025 Towards Understanding Text Hallucination of Diffusion Models via Local Generation Bias Rui Lu, Runzhe Wang, Kaifeng Lyu, Xitai Jiang, Gao Huang, Mengdi Wang
ICML 2025 Weak-to-Strong Generalization Even in Random Feature Networks, Provably Marko Medvedev, Kaifeng Lyu, Dingli Yu, Sanjeev Arora, Zhiyuan Li, Nathan Srebro
NeurIPSW 2024 A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules Kairong Luo, Haodong Wen, Shengding Hu, Zhenbo Sun, Zhiyuan Liu, Maosong Sun, Kaifeng Lyu, Wenguang Chen
ICLR 2024 A Quadratic Synchronization Rule for Distributed Deep Learning Xinran Gu, Kaifeng Lyu, Sanjeev Arora, Jingzhao Zhang, Longbo Huang
NeurIPSW 2024 AI-Assisted Generation of Difficult Math Questions Vedant Shah, Dingli Yu, Kaifeng Lyu, Simon Park, Jiatong Yu, Yinghui He, Nan Rosemary Ke, Michael Curtis Mozer, Yoshua Bengio, Sanjeev Arora, Anirudh Goyal
ICLR 2024 Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking Kaifeng Lyu, Jikai Jin, Zhiyuan Li, Simon Shaolei Du, Jason D. Lee, Wei Hu
ICLR 2024 DistillSpec: Improving Speculative Decoding via Knowledge Distillation Yongchao Zhou, Kaifeng Lyu, Ankit Singh Rawat, Aditya Krishna Menon, Afshin Rostamizadeh, Sanjiv Kumar, Jean-François Kagy, Rishabh Agarwal
NeurIPS 2024 Keeping LLMs Aligned After Fine-Tuning: The Crucial Role of Prompt Templates Kaifeng Lyu, Haoyu Zhao, Xinran Gu, Dingli Yu, Anirudh Goyal, Sanjeev Arora
ICLRW 2024 Keeping LLMs Aligned After Fine-Tuning: The Crucial Role of Prompt Templates Kaifeng Lyu, Haoyu Zhao, Xinran Gu, Dingli Yu, Anirudh Goyal, Sanjeev Arora
ICLRW 2024 On the Representation Gap Between Modern RNNs and Transformers: The Curse of Memory Efficiency and the Fix of In-Context Retrieval Kaiyue Wen, Xingyu Dang, Kaifeng Lyu
ICLR 2024 The Marginal Value of Momentum for Small Learning Rate SGD Runzhe Wang, Sadhika Malladi, Tianhao Wang, Kaifeng Lyu, Zhiyuan Li
NeurIPSW 2023 A Quadratic Synchronization Rule for Distributed Deep Learning Xinran Gu, Kaifeng Lyu, Sanjeev Arora, Jingzhao Zhang, Longbo Huang
ICML 2023 Understanding Incremental Learning of Gradient Descent: A Fine-Grained Analysis of Matrix Sensing Jikai Jin, Zhiyuan Li, Kaifeng Lyu, Simon Shaolei Du, Jason D. Lee
ICLR 2023 Why (and When) Does Local SGD Generalize Better than SGD? Xinran Gu, Kaifeng Lyu, Longbo Huang, Sanjeev Arora
NeurIPS 2022 New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and Sound Arushi Gupta, Nikunj Saunshi, Dingli Yu, Kaifeng Lyu, Sanjeev Arora
NeurIPS 2022 On the SDEs and Scaling Rules for Adaptive Gradient Algorithms Sadhika Malladi, Kaifeng Lyu, Abhishek Panigrahi, Sanjeev Arora
NeurIPS 2022 Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction Kaifeng Lyu, Zhiyuan Li, Sanjeev Arora
NeurIPSW 2022 Why (and When) Does Local SGD Generalize Better than SGD? Xinran Gu, Kaifeng Lyu, Longbo Huang, Sanjeev Arora
NeurIPS 2021 Gradient Descent on Two-Layer Nets: Margin Maximization and Simplicity Bias Kaifeng Lyu, Zhiyuan Li, Runzhe Wang, Sanjeev Arora
ICLR 2021 Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning Zhiyuan Li, Yuping Luo, Kaifeng Lyu
ICLR 2020 Gradient Descent Maximizes the Margin of Homogeneous Neural Networks Kaifeng Lyu, Jian Li
NeurIPS 2020 Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate Zhiyuan Li, Kaifeng Lyu, Sanjeev Arora
ICLR 2019 Theoretical Analysis of Auto Rate-Tuning by Batch Normalization Sanjeev Arora, Zhiyuan Li, Kaifeng Lyu