Ahn, Kwangjun

25 publications

ICLR 2025 Does SGD Really Happen in Tiny Subspaces? Minhak Song, Kwangjun Ahn, Chulhee Yun
ICML 2025 General Framework for Online-to-Nonconvex Conversion: Schedule-Free SGD Is Also Effective for Nonconvex Optimization Kwangjun Ahn, Gagik Magakyan, Ashok Cutkosky
ICLR 2025 The Belief State Transformer Edward S. Hu, Kwangjun Ahn, Qinghua Liu, Haoran Xu, Manan Tomar, Ada Langford, Dinesh Jayaraman, Alex Lamb, John Langford
NeurIPS 2025 Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training Minhak Song, Beomhan Baek, Kwangjun Ahn, Chulhee Yun
NeurIPS 2024 Adam with Model Exponential Moving Average Is Effective for Nonconvex Optimization Kwangjun Ahn, Ashok Cutkosky
ICMLW 2024 Does SGD Really Happen in Tiny Subspaces? Minhak Song, Kwangjun Ahn, Chulhee Yun
ICML 2024 How to Escape Sharp Minima with Random Perturbations Kwangjun Ahn, Ali Jadbabaie, Suvrit Sra
ICLR 2024 Linear Attention Is (maybe) All You Need (to Understand Transformer Optimization) Kwangjun Ahn, Xiang Cheng, Minhak Song, Chulhee Yun, Ali Jadbabaie, Suvrit Sra
ICML 2024 Understanding Adam Optimizer via Online Learning of Updates: Adam Is FTRL in Disguise Kwangjun Ahn, Zhiyu Zhang, Yunbum Kook, Yan Dai
JMLR 2023 A Unified Approach to Controlling Implicit Regularization via Mirror Descent Haoyuan Sun, Khashayar Gatmiry, Kwangjun Ahn, Navid Azizan
NeurIPS 2023 Learning Threshold Neurons via Edge of Stability Kwangjun Ahn, Sebastien Bubeck, Sinho Chewi, Yin Tat Lee, Felipe Suarez, Yi Zhang
NeurIPSW 2023 Linear Attention Is (maybe) All You Need (to Understand Transformer Optimization) Kwangjun Ahn, Xiang Cheng, Minhak Song, Chulhee Yun, Ali Jadbabaie, Suvrit Sra
L4DC 2023 Model Predictive Control via On-Policy Imitation Learning Kwangjun Ahn, Zakaria Mhammedi, Horia Mania, Zhang-Wei Hong, Ali Jadbabaie
NeurIPSW 2023 SpecTr++: Improved Transport Plans for Speculative Decoding of Large Language Models Kwangjun Ahn, Ahmad Beirami, Ziteng Sun, Ananda Theertha Suresh
NeurIPS 2023 The Crucial Role of Normalization in Sharpness-Aware Minimization Yan Dai, Kwangjun Ahn, Suvrit Sra
NeurIPS 2023 Transformers Learn to Implement Preconditioned Gradient Descent for In-Context Learning Kwangjun Ahn, Xiang Cheng, Hadi Daneshmand, Suvrit Sra
ICML 2022 Agnostic Learnability of Halfspaces via Logistic Loss Ziwei Ji, Kwangjun Ahn, Pranjal Awasthi, Satyen Kale, Stefani Karp
NeurIPS 2022 Mirror Descent Maximizes Generalized Margin and Can Be Implemented Efficiently Haoyuan Sun, Kwangjun Ahn, Christos Thrampoulidis, Navid Azizan
NeurIPS 2022 Reproducibility in Optimization: Theoretical Framework and Limits Kwangjun Ahn, Prateek Jain, Ziwei Ji, Satyen Kale, Praneeth Netrapalli, Gil I Shamir
ICML 2022 Understanding the Unstable Convergence of Gradient Descent Kwangjun Ahn, Jingzhao Zhang, Suvrit Sra
NeurIPS 2021 Efficient Constrained Sampling via the Mirror-Langevin Algorithm Kwangjun Ahn, Sinho Chewi
COLT 2021 Optimal Dimension Dependence of the Metropolis-Adjusted Langevin Algorithm Sinho Chewi, Chen Lu, Kwangjun Ahn, Xiang Cheng, Thibaut Le Gouic, Philippe Rigollet
COLT 2020 From Nesterov’s Estimate Sequence to Riemannian Acceleration Kwangjun Ahn, Suvrit Sra
NeurIPS 2020 SGD with Shuffling: Optimal Rates Without Component Convexity and Large Epoch Requirements Kwangjun Ahn, Chulhee Yun, Suvrit Sra
NeurIPS 2018 Binary Rating Estimation with Graph Side Information Kwangjun Ahn, Kangwook Lee, Hyunseung Cha, Changho Suh