ML Anthology
Authors
Search
About
Kosson, Atli
11 publications
TMLR
2025
Training Dynamics of the Cooldown Stage in Warmup-Stable-Decay Learning Rate Scheduler
Aleksandr Dremov
,
Alexander Hägele
,
Atli Kosson
,
Martin Jaggi
ICMLW
2024
Analyzing & Eliminating Learning Rate Warmup in GPT Pre-Training
Atli Kosson
,
Bettina Messmer
,
Martin Jaggi
NeurIPS
2024
Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training
Atli Kosson
,
Bettina Messmer
,
Martin Jaggi
AAAI
2024
Ghost Noise for Regularizing Deep Neural Networks
Atli Kosson
,
Dongyang Fan
,
Martin Jaggi
ICML
2024
Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks
Atli Kosson
,
Bettina Messmer
,
Martin Jaggi
NeurIPS
2024
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
Alexander Hägele
,
Elie Bakouch
,
Atli Kosson
,
Loubna Ben Allal
,
Leandro Von Werra
,
Martin Jaggi
ICMLW
2024
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
Alexander Hägele
,
Elie Bakouch
,
Atli Kosson
,
Loubna Ben Allal
,
Leandro Von Werra
,
Martin Jaggi
NeurIPS
2023
Multiplication-Free Transformer Training via Piecewise Affine Operations
Atli Kosson
,
Martin Jaggi
NeurIPSW
2023
Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks
Atli Kosson
,
Bettina Messmer
,
Martin Jaggi
NeurIPSW
2023
Understanding the Role of Noisy Statistics in the Regularization Effect of Batch Normalization
Atli Kosson
,
Dongyang Fan
,
Martin Jaggi
NeurIPS
2019
Online Normalization for Training Neural Networks
Vitaliy Chiley
,
Ilya Sharapov
,
Atli Kosson
,
Urs Koster
,
Ryan Reece
,
Sofia Samaniego de la Fuente
,
Vishal Subbiah
,
Michael James