Using Curvature Information for Fast Stochastic Search

Abstract

We present an algorithm for fast stochastic gradient descent that uses a nonlinear adaptive momentum scheme to optimize the late time convergence rate. The algorithm makes effective use of cur(cid:173) vature information, requires only O(n) storage and computation, and delivers convergence rates close to the theoretical optimum. We demonstrate the technique on linear and large nonlinear back(cid:173) prop networks.

Cite

Text

Orr and Leen. "Using Curvature Information for Fast Stochastic Search." Neural Information Processing Systems, 1996.

Markdown

[Orr and Leen. "Using Curvature Information for Fast Stochastic Search." Neural Information Processing Systems, 1996.](https://mlanthology.org/neurips/1996/orr1996neurips-using/)

BibTeX

@inproceedings{orr1996neurips-using,
  title     = {{Using Curvature Information for Fast Stochastic Search}},
  author    = {Orr, Genevieve B. and Leen, Todd K.},
  booktitle = {Neural Information Processing Systems},
  year      = {1996},
  pages     = {606-612},
  url       = {https://mlanthology.org/neurips/1996/orr1996neurips-using/}
}