Hestness, Joel
9 publications
NeurIPSW
2024
Empirical Upper Bounds for Unstructured Sparsity in Compute-Efficient Language Modeling
NeurIPS
2024
Normalization Layer Per-Example Gradients Are Sufficient to Predict Gradient Noise Scale in Transformers