Douillard, Arthur
13 publications
NeurIPS
2025
Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo
NeurIPS
2025
Understanding Outer Optimizers in Local SGD: Learning Rates, Momentum, and Acceleration