“Plus/minus the Learning Rate”: Easy and Scalable Statistical Inference with SGD
Abstract
In this paper, we develop a statistical inference procedure using stochastic gradient descent (SGD)-based confidence intervals. These intervals are of the simplest possible form: $\theta_{N,j} \pm 2\sqrt{}(\gamma/N)$ , where $\theta_N$ is the SGD estimate of model parameters $\theta$ over N data points, and $\gamma$ is the learning rate. This construction relies only on a proper selection of the learning rate to ensure the standard SGD conditions for O(1/n) convergence. The procedure performs well in our empirical evaluations, achieving near-nominal coverage intervals scaling up to 20$\times$ as many parameters as other SGD-based inference methods. We also demonstrate our method’s practical significance on modeling adverse events in emergency general surgery patients using a novel dataset from the Hospital of the University of Pennsylvania.
Cite
Text
Chee et al. "“Plus/minus the Learning Rate”: Easy and Scalable Statistical Inference with SGD." Artificial Intelligence and Statistics, 2023.Markdown
[Chee et al. "“Plus/minus the Learning Rate”: Easy and Scalable Statistical Inference with SGD." Artificial Intelligence and Statistics, 2023.](https://mlanthology.org/aistats/2023/chee2023aistats-plus/)BibTeX
@inproceedings{chee2023aistats-plus,
title = {{“Plus/minus the Learning Rate”: Easy and Scalable Statistical Inference with SGD}},
author = {Chee, Jerry and Kim, Hwanwoo and Toulis, Panos},
booktitle = {Artificial Intelligence and Statistics},
year = {2023},
pages = {2285-2309},
volume = {206},
url = {https://mlanthology.org/aistats/2023/chee2023aistats-plus/}
}