High Dimensional First Order Mini-Batch Algorithms on Quadratic Problems

Abstract

We analyze the dynamics of general mini-batch first order algorithms on the $\ell_2$ regularized least squares problem when the number of samples and dimensions are large. This includes stochastic gradient descent (SGD), stochastic Nesterov (convex/strongly convex), and stochastic momentum. In this setting, we show that the dynamics of these algorithms concentrate to a deterministic discrete Volterra equation $\Psi$ in the high-dimensional limit. In turn, we show that we can use $\Psi$ to capture the behaviour of general mini-batch first order algorithm under any quadratic statistics $\mathcal{R}: \mathbb{R}^d \to \mathbb{R}$, including but not limited to: training loss, excess risk for empirical risk minimization (in-distribution and generalization error.

Cite

Text

Cheng et al. "High Dimensional First Order Mini-Batch Algorithms on Quadratic Problems." NeurIPS 2024 Workshops: OPT, 2024.

Markdown

[Cheng et al. "High Dimensional First Order Mini-Batch Algorithms on Quadratic Problems." NeurIPS 2024 Workshops: OPT, 2024.](https://mlanthology.org/neuripsw/2024/cheng2024neuripsw-high/)

BibTeX

@inproceedings{cheng2024neuripsw-high,
  title     = {{High Dimensional First Order Mini-Batch Algorithms on Quadratic Problems}},
  author    = {Cheng, Andrew Nicholas and Lee, Kiwon and Paquette, Courtney},
  booktitle = {NeurIPS 2024 Workshops: OPT},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/cheng2024neuripsw-high/}
}