Gray, Gavia

6 publications

NeurIPS 2025 Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-Training Shane Bergsma, Nolan Simran Dey, Gurpreet Gosal, Gavia Gray, Daria Soboleva, Joel Hestness
ICLR 2025 Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs Shane Bergsma, Nolan Simran Dey, Gurpreet Gosal, Gavia Gray, Daria Soboleva, Joel Hestness
NeurIPSW 2024 Empirical Upper Bounds for Unstructured Sparsity in Compute-Efficient Language Modeling Esha Singh, Shane Bergsma, Nolan Simran Dey, Joel Hestness, Gavia Gray
NeurIPS 2024 Normalization Layer Per-Example Gradients Are Sufficient to Predict Gradient Noise Scale in Transformers Gavia Gray, Aman Tiwari, Shane Bergsma, Joel Hestness
NeurIPSW 2023 Efficient and Approximate Per-Example Gradient Norms for Gradient Noise Scale Gavia Gray, Anshul Samar, Joel Hestness
NeurIPSW 2023 Transferring Movement Understanding for Parkinson’s Therapy by Generative Pre-Training Emily Napier, Gavia Gray, Tristan Loria, Veronica Vuong, Michael Thaut, Sageev Oore