Song, Minhak

7 publications

ICLR 2025 Does SGD Really Happen in Tiny Subspaces? Minhak Song, Kwangjun Ahn, Chulhee Yun
NeurIPS 2025 Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training Minhak Song, Beomhan Baek, Kwangjun Ahn, Chulhee Yun
ICML 2025 Understanding Sharpness Dynamics in NN Training with a Minimalist Example: The Effects of Dataset Difficulty, Depth, Stochasticity, and More Geonhui Yoo, Minhak Song, Chulhee Yun
ICMLW 2024 Does SGD Really Happen in Tiny Subspaces? Minhak Song, Kwangjun Ahn, Chulhee Yun
ICLR 2024 Linear Attention Is (maybe) All You Need (to Understand Transformer Optimization) Kwangjun Ahn, Xiang Cheng, Minhak Song, Chulhee Yun, Ali Jadbabaie, Suvrit Sra
NeurIPSW 2023 Linear Attention Is (maybe) All You Need (to Understand Transformer Optimization) Kwangjun Ahn, Xiang Cheng, Minhak Song, Chulhee Yun, Ali Jadbabaie, Suvrit Sra
NeurIPS 2023 Trajectory Alignment: Understanding the Edge of Stability Phenomenon via Bifurcation Theory Minhak Song, Chulhee Yun