Kosson, Atli

11 publications

TMLR 2025 Training Dynamics of the Cooldown Stage in Warmup-Stable-Decay Learning Rate Scheduler Aleksandr Dremov, Alexander Hägele, Atli Kosson, Martin Jaggi
ICMLW 2024 Analyzing & Eliminating Learning Rate Warmup in GPT Pre-Training Atli Kosson, Bettina Messmer, Martin Jaggi
NeurIPS 2024 Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training Atli Kosson, Bettina Messmer, Martin Jaggi
AAAI 2024 Ghost Noise for Regularizing Deep Neural Networks Atli Kosson, Dongyang Fan, Martin Jaggi
ICML 2024 Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks Atli Kosson, Bettina Messmer, Martin Jaggi
NeurIPS 2024 Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations Alexander Hägele, Elie Bakouch, Atli Kosson, Loubna Ben Allal, Leandro Von Werra, Martin Jaggi
ICMLW 2024 Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations Alexander Hägele, Elie Bakouch, Atli Kosson, Loubna Ben Allal, Leandro Von Werra, Martin Jaggi
NeurIPS 2023 Multiplication-Free Transformer Training via Piecewise Affine Operations Atli Kosson, Martin Jaggi
NeurIPSW 2023 Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks Atli Kosson, Bettina Messmer, Martin Jaggi
NeurIPSW 2023 Understanding the Role of Noisy Statistics in the Regularization Effect of Batch Normalization Atli Kosson, Dongyang Fan, Martin Jaggi
NeurIPS 2019 Online Normalization for Training Neural Networks Vitaliy Chiley, Ilya Sharapov, Atli Kosson, Urs Koster, Ryan Reece, Sofia Samaniego de la Fuente, Vishal Subbiah, Michael James