Tiwari, Aman

1 publications

NeurIPS 2024 Normalization Layer Per-Example Gradients Are Sufficient to Predict Gradient Noise Scale in Transformers Gavia Gray, Aman Tiwari, Shane Bergsma, Joel Hestness