“Plus/minus the Learning Rate”: Easy and Scalable Statistical Inference with SGD

Abstract

In this paper, we develop a statistical inference procedure using stochastic gradient descent (SGD)-based confidence intervals. These intervals are of the simplest possible form: $\theta_{N,j} \pm 2\sqrt{}(\gamma/N)$ , where $\theta_N$ is the SGD estimate of model parameters $\theta$ over N data points, and $\gamma$ is the learning rate. This construction relies only on a proper selection of the learning rate to ensure the standard SGD conditions for O(1/n) convergence. The procedure performs well in our empirical evaluations, achieving near-nominal coverage intervals scaling up to 20$\times$ as many parameters as other SGD-based inference methods. We also demonstrate our method’s practical significance on modeling adverse events in emergency general surgery patients using a novel dataset from the Hospital of the University of Pennsylvania.

Cite

Text

Chee et al. "“Plus/minus the Learning Rate”: Easy and Scalable Statistical Inference with SGD." Artificial Intelligence and Statistics, 2023.

Markdown

[Chee et al. "“Plus/minus the Learning Rate”: Easy and Scalable Statistical Inference with SGD." Artificial Intelligence and Statistics, 2023.](https://mlanthology.org/aistats/2023/chee2023aistats-plus/)

BibTeX

@inproceedings{chee2023aistats-plus,
  title     = {{“Plus/minus the Learning Rate”: Easy and Scalable Statistical Inference with SGD}},
  author    = {Chee, Jerry and Kim, Hwanwoo and Toulis, Panos},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2023},
  pages     = {2285-2309},
  volume    = {206},
  url       = {https://mlanthology.org/aistats/2023/chee2023aistats-plus/}
}