Yun, Chulhee

48 publications

ICLR 2025 Arithmetic Transformers Can Length-Generalize in Both Operand Length and Count Hanseul Cho, Jaeyoung Cha, Srinadh Bhojanapalli, Chulhee Yun
ICLR 2025 Convergence and Implicit Bias of Gradient Descent on Continual Linear Classification Hyunji Jung, Hanseul Cho, Chulhee Yun
ICLR 2025 Does SGD Really Happen in Tiny Subspaces? Minhak Song, Kwangjun Ahn, Chulhee Yun
NeurIPS 2025 From Linear to Nonlinear: Provable Weak-to-Strong Generalization Through Feature Learning Junsoo Oh, Jerry Song, Chulhee Yun
ICML 2025 Incremental Gradient Descent with Small Epoch Counts Is Surprisingly Slow on Ill-Conditioned Problems Yujun Kim, Jaeyoung Cha, Chulhee Yun
ICML 2025 Lightweight Dataset Pruning Without Full Training via Example Difficulty and Prediction Uncertainty Yeseul Cho, Baekrok Shin, Changmin Kang, Chulhee Yun
ICLRW 2025 Lightweight Dataset Pruning Without Full Training via Example Difficulty and Prediction Uncertainty Yeseul Cho, Baekrok Shin, Changmin Kang, Chulhee Yun
ICLR 2025 Parameter Expanded Stochastic Gradient Markov Chain Monte Carlo Hyunsu Kim, Giung Nam, Chulhee Yun, Hongseok Yang, Juho Lee
ICML 2025 Provable Benefit of Random Permutations over Uniform Sampling in Stochastic Coordinate Descent Donghwa Kim, Jaewook Lee, Chulhee Yun
NeurIPS 2025 The Cost of Robustness: Tighter Bounds on Parameter Complexity for Robust Memorization in ReLU Nets Yujun Kim, Chaewon Moon, Chulhee Yun
NeurIPS 2025 Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training Minhak Song, Beomhan Baek, Kwangjun Ahn, Chulhee Yun
ICML 2025 Understanding Sharpness Dynamics in NN Training with a Minimalist Example: The Effects of Dataset Difficulty, Depth, Stochasticity, and More Geonhui Yoo, Minhak Song, Chulhee Yun
ICMLW 2024 DASH: Warm-Starting Neural Network Training Without Loss of Plasticity Under Stationarity Baekrok Shin, Junsoo Oh, Hanseul Cho, Chulhee Yun
NeurIPS 2024 DASH: Warm-Starting Neural Network Training in Stationary Settings Without Loss of Plasticity Baekrok Shin, Junsoo Oh, Hanseul Cho, Chulhee Yun
ICMLW 2024 Does SGD Really Happen in Tiny Subspaces? Minhak Song, Kwangjun Ahn, Chulhee Yun
ICML 2024 Fundamental Benefit of Alternating Updates in Minimax Optimization Jaewook Lee, Hanseul Cho, Chulhee Yun
ICMLW 2024 Gradient Descent with Polyak’s Momentum Finds Flatter Minima via Large Catapults Prin Phunyaphibarn, Junghyun Lee, Bohan Wang, Huishuai Zhang, Chulhee Yun
ICLR 2024 Linear Attention Is (maybe) All You Need (to Understand Transformer Optimization) Kwangjun Ahn, Xiang Cheng, Minhak Song, Chulhee Yun, Ali Jadbabaie, Suvrit Sra
NeurIPS 2024 Position Coupling: Improving Length Generalization of Arithmetic Transformers Using Task Structure Hanseul Cho, Jaeyoung Cha, Pranjal Awasthi, Srinadh Bhojanapalli, Anupam Gupta, Chulhee Yun
ICMLW 2024 Position Coupling: Leveraging Task Structure for Improved Length Generalization of Transformers Hanseul Cho, Jaeyoung Cha, Pranjal Awasthi, Srinadh Bhojanapalli, Anupam Gupta, Chulhee Yun
NeurIPS 2024 Provable Benefit of Cutout and CutMix for Feature Learning Junsoo Oh, Chulhee Yun
ICMLW 2024 Provable Benefit of Cutout and CutMix for Feature Learning Junsoo Oh, Chulhee Yun
NeurIPS 2024 Stochastic Extragradient with Flip-Flop Shuffling & Anchoring: Provable Improvements Jiseok Chae, Chulhee Yun, Donghwan Kim
NeurIPS 2023 Fair Streaming Principal Component Analysis: Statistical and Algorithmic Viewpoint Junghyun Lee, Hanseul Cho, Se-Young Yun, Chulhee Yun
NeurIPSW 2023 Large Catapults in Momentum Gradient Descent with Warmup: An Empirical Study Prin Phunyaphibarn, Junghyun Lee, Bohan Wang, Huishuai Zhang, Chulhee Yun
NeurIPSW 2023 Linear Attention Is (maybe) All You Need (to Understand Transformer Optimization) Kwangjun Ahn, Xiang Cheng, Minhak Song, Chulhee Yun, Ali Jadbabaie, Suvrit Sra
ICML 2023 On the Training Instability of Shuffling SGD with Batch Normalization David Xing Wu, Chulhee Yun, Suvrit Sra
NeurIPS 2023 PLASTIC: Improving Input and Label Plasticity for Sample Efficient Reinforcement Learning Hojoon Lee, Hanseul Cho, Hyunseung Kim, Daehoon Gwak, Joonkee Kim, Jaegul Choo, Se-Young Yun, Chulhee Yun
NeurIPS 2023 Practical Sharpness-Aware Minimization Cannot Converge All the Way to Optima Dongkuk Si, Chulhee Yun
ICML 2023 Provable Benefit of Mixup for Finding Optimal Decision Boundaries Junsoo Oh, Chulhee Yun
ICLR 2023 SGDA with Shuffling: Faster Convergence for Nonconvex-PŁ Minimax Optimization Hanseul Cho, Chulhee Yun
ICML 2023 Tighter Lower Bounds for Shuffling SGD: Random Permutations and Beyond Jaeyoung Cha, Jaewook Lee, Chulhee Yun
NeurIPS 2023 Trajectory Alignment: Understanding the Edge of Stability Phenomenon via Bifurcation Theory Minhak Song, Chulhee Yun
ICLR 2022 Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond Chulhee Yun, Shashank Rajput, Suvrit Sra
ICLR 2021 A Unifying View on Implicit Bias in Training Linear Neural Networks Chulhee Yun, Shankar Krishnan, Hossein Mobahi
ICLR 2021 Minimum Width for Universal Approximation Sejun Park, Chulhee Yun, Jaeho Lee, Jinwoo Shin
COLT 2021 Open Problem: Can Single-Shuffle SGD Be Better than Reshuffling SGD and GD? Chulhee Yun, Suvrit Sra, Ali Jadbabaie
COLT 2021 Provable Memorization via Deep Neural Networks Using Sub-Linear Parameters Sejun Park, Jaeho Lee, Chulhee Yun, Jinwoo Shin
ICLR 2020 Are Transformers Universal Approximators of Sequence-to-Sequence Functions? Chulhee Yun, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar
ICML 2020 Low-Rank Bottleneck in Multi-Head Attention Models Srinadh Bhojanapalli, Chulhee Yun, Ankit Singh Rawat, Sashank Reddi, Sanjiv Kumar
NeurIPS 2020 O(n) Connections Are Expressive Enough: Universal Approximability of Sparse Transformers Chulhee Yun, Yin-Wen Chang, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank Reddi, Sanjiv Kumar
NeurIPS 2020 SGD with Shuffling: Optimal Rates Without Component Convexity and Large Epoch Requirements Kwangjun Ahn, Chulhee Yun, Suvrit Sra
NeurIPS 2019 Are Deep ResNets Provably Better than Linear Predictors? Chulhee Yun, Suvrit Sra, Ali Jadbabaie
ICLR 2019 Efficiently Testing Local Optimality and Escaping Saddles for ReLU Networks Chulhee Yun, Suvrit Sra, Ali Jadbabaie
ICLR 2019 Small Nonlinearities in Activation Functions Create Bad Local Minima in Neural Networks Chulhee Yun, Suvrit Sra, Ali Jadbabaie
NeurIPS 2019 Small ReLU Networks Are Powerful Memorizers: A Tight Analysis of Memorization Capacity Chulhee Yun, Suvrit Sra, Ali Jadbabaie
ICLR 2018 Global Optimality Conditions for Deep Neural Networks Chulhee Yun, Suvrit Sra, Ali Jadbabaie
COLT 2018 Minimax Bounds on Stochastic Batched Convex Optimization John C. Duchi, Feng Ruan, Chulhee Yun