ML Anthology
Authors
Search
About
Marek, Martin
1 publications
NeurIPS
2025
Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful
Martin Marek
,
Sanae Lotfi
,
Aditya Somasundaram
,
Andrew Gordon Wilson
,
Micah Goldblum