Kunstner, Frederik
13 publications
NeurIPS
2025
Scaling Laws for Gradient Descent and Sign Descent for Linear Bigram Models Under Zipf’s Law
NeurIPS
2024
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models
NeurIPSW
2024
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models