Note on Learning Rate Schedules for Stochastic Optimization
Abstract
We present and compare learning rate schedules for stochastic gradient descent, a general algorithm which includes LMS, on-line backpropaga(cid:173) tion and k-means clustering as special cases. We introduce "search-then(cid:173) converge" type schedules which outperform the classical constant and "running average" (1ft) schedules both in speed of convergence and quality of solution.
Cite
Text
Darken and Moody. "Note on Learning Rate Schedules for Stochastic Optimization." Neural Information Processing Systems, 1990.Markdown
[Darken and Moody. "Note on Learning Rate Schedules for Stochastic Optimization." Neural Information Processing Systems, 1990.](https://mlanthology.org/neurips/1990/darken1990neurips-note/)BibTeX
@inproceedings{darken1990neurips-note,
title = {{Note on Learning Rate Schedules for Stochastic Optimization}},
author = {Darken, Christian and Moody, John E.},
booktitle = {Neural Information Processing Systems},
year = {1990},
pages = {832-838},
url = {https://mlanthology.org/neurips/1990/darken1990neurips-note/}
}