Periodic Step Size Adaptation for Single Pass On-Line Learning
Abstract
It has been established that the second-order stochastic gradient descent (2SGD) method can potentially achieve generalization performance as well as empirical optimum in a single pass (i.e., epoch) through the training examples. However, 2SGD requires computing the inverse of the Hessian matrix of the loss function, which is prohibitively expensive. This paper presents Periodic Step-size Adaptation (PSA), which approximates the Jacobian matrix of the mapping function and explores a linear relation between the Jacobian and Hessian to approximate the Hessian periodically and achieve near-optimal results in experiments on a wide variety of models and tasks.
Cite
Text
Hsu et al. "Periodic Step Size Adaptation for Single Pass On-Line Learning." Neural Information Processing Systems, 2009.Markdown
[Hsu et al. "Periodic Step Size Adaptation for Single Pass On-Line Learning." Neural Information Processing Systems, 2009.](https://mlanthology.org/neurips/2009/hsu2009neurips-periodic/)BibTeX
@inproceedings{hsu2009neurips-periodic,
title = {{Periodic Step Size Adaptation for Single Pass On-Line Learning}},
author = {Hsu, Chun-nan and Chang, Yu-ming and Huang, Hanshen and Lee, Yuh-jye},
booktitle = {Neural Information Processing Systems},
year = {2009},
pages = {763-771},
url = {https://mlanthology.org/neurips/2009/hsu2009neurips-periodic/}
}