Ordered SGD: A New Stochastic Optimization Framework for Empirical Risk Minimization

Abstract

We propose a new stochastic optimization framework for empirical risk minimization problems such as those that arise in machine learning. The traditional approaches, such as (mini-batch) stochastic gradient descent (SGD), utilize an unbiased gradient estimator of the empirical average loss. In contrast, we develop a computationally efficient method to construct a gradient estimator that is purposely biased toward those observations with higher current losses. On the theory side, we show that the proposed method minimizes a new ordered modification of the empirical average loss, and is guaranteed to converge at a sublinear rate to a global optimum for convex loss and to a critical point for weakly convex (non-convex) loss. Furthermore, we prove a new generalization bound for the proposed algorithm. On the empirical side, the numerical experiments show that our proposed method consistently improves the test errors compared with the standard mini-batch SGD in various models including SVM, logistic regression, and deep learning problems.

Cite

Text

Kawaguchi and Lu. "Ordered SGD: A New Stochastic Optimization Framework for Empirical Risk Minimization." Artificial Intelligence and Statistics, 2020.

Markdown

[Kawaguchi and Lu. "Ordered SGD: A New Stochastic Optimization Framework for Empirical Risk Minimization." Artificial Intelligence and Statistics, 2020.](https://mlanthology.org/aistats/2020/kawaguchi2020aistats-ordered/)

BibTeX

@inproceedings{kawaguchi2020aistats-ordered,
  title     = {{Ordered SGD: A New Stochastic Optimization Framework for Empirical Risk Minimization}},
  author    = {Kawaguchi, Kenji and Lu, Haihao},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2020},
  pages     = {669-679},
  volume    = {108},
  url       = {https://mlanthology.org/aistats/2020/kawaguchi2020aistats-ordered/}
}