Soboleva, Daria

2 publications

NeurIPS 2025 Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-Training Shane Bergsma, Nolan Simran Dey, Gurpreet Gosal, Gavia Gray, Daria Soboleva, Joel Hestness
ICLR 2025 Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs Shane Bergsma, Nolan Simran Dey, Gurpreet Gosal, Gavia Gray, Daria Soboleva, Joel Hestness