ML Anthology
Authors
Search
About
Tiwari, Aman
1 publications
NeurIPS
2024
Normalization Layer Per-Example Gradients Are Sufficient to Predict Gradient Noise Scale in Transformers
Gavia Gray
,
Aman Tiwari
,
Shane Bergsma
,
Joel Hestness