Schmidt, Mark
72 publications
NeurIPSW
2024
BlockLLM: Memory-Efficient Adaptation of LLMs by Selecting and Optimizing the Right Coordinate Blocks
NeurIPS
2024
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models
NeurIPSW
2024
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models
ECML-PKDD
2023
Fast Convergence of Random Reshuffling Under Over-Parameterization and the Polyak-Łojasiewicz Condition
NeurIPSW
2023
Why Adam Outperforms Gradient Descent on Language Models: A Heavy-Tailed Class Imbalance Problem
NeurIPSW
2022
Fast Convergence of Greedy 2-Coordinate Updates for Optimizing with an Equality Constraint
NeurIPSW
2022
Fast Convergence of Random Reshuffling Under Interpolation and the Polyak-\l Ojasiewicz Condition
NeurIPSW
2022
Practical Structured Riemannian Optimization with Momentum by Using Generalized Normal Coordinates
NeurIPSW
2021
An Empirical Study of Non-Uniform Sampling in Off-Policy Reinforcement Learning for Continuous Control
NeurIPS
2020
Regret Bounds Without Lipschitz Continuity: Online Learning with Relative-Lipschitz Losses
AISTATS
2019
Distributed Maximization of "Submodular Plus Diversity" Functions for Multi-Label Feature Selection on Huge Datasets
AISTATS
2019
Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron
ECML-PKDD
2018
MASAGA: A Linearly-Convergent Stochastic First-Order Method for Optimization on Manifolds
NeurIPS
2018
SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient
ECML-PKDD
2016
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition
NeurIPS
2012
A Stochastic Gradient Method with an Exponential Convergence _Rate for Finite Training Sets
UAI
2011
Generalized Fast Approximate Energy Minimization via Graph Cuts: A-Expansion B-Shrink Moves