Kunstner, Frederik

13 publications

NeurIPS 2025 Scaling Laws for Gradient Descent and Sign Descent for Linear Bigram Models Under Zipf’s Law Frederik Kunstner, Francis Bach
NeurIPS 2024 Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models Frederik Kunstner, Alan Milligan, Robin Yadav, Mark Schmidt, Alberto Bietti
NeurIPSW 2024 Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models Frederik Kunstner, Alan Milligan, Robin Yadav, Mark Schmidt, Alberto Bietti
NeurIPSW 2024 Normalization Matters for Optimization Performance on Graph Neural Networks Alan Milligan, Frederik Kunstner, Hamed Shirzad, Mark Schmidt, Danica J. Sutherland
ICLR 2023 Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might Be Frederik Kunstner, Jacques Chen, Jonathan Wilder Lavington, Mark Schmidt
NeurIPS 2023 Searching for Optimal Per-Coordinate Step-Sizes with Multidimensional Backtracking Frederik Kunstner, Victor Sanches Portella, Mark Schmidt, Nicholas Harvey
NeurIPSW 2023 Variance Reduced Model Based Methods: New Rates and Adaptive Step Sizes Robert M. Gower, Frederik Kunstner, Mark Schmidt
NeurIPSW 2023 Why Adam Outperforms Gradient Descent on Language Models: A Heavy-Tailed Class Imbalance Problem Robin Yadav, Frederik Kunstner, Mark Schmidt, Alberto Bietti
IJCAI 2022 Homeomorphic-Invariance of EM: Non-Asymptotic Convergence in KL Divergence for Exponential Families via Mirror Descent (Extended Abstract) Frederik Kunstner, Raunak Kumar, Mark Schmidt
AISTATS 2021 Homeomorphic-Invariance of EM: Non-Asymptotic Convergence in KL Divergence for Exponential Families via Mirror Descent Frederik Kunstner, Raunak Kumar, Mark Schmidt
ICLR 2020 BackPACK: Packing More into Backprop Felix Dangel, Frederik Kunstner, Philipp Hennig
NeurIPS 2019 Limitations of the Empirical Fisher Approximation for Natural Gradient Descent Frederik Kunstner, Philipp Hennig, Lukas Balles
NeurIPS 2018 SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient Aaron Mishkin, Frederik Kunstner, Didrik Nielsen, Mark Schmidt, Mohammad Emtiyaz Khan