Diffusion Approximations for the Constant Learning Rate Backpropagation Algorithm and Resistence to Local Minima
Abstract
In this paper we discuss the asymptotic properties of the most com(cid:173) monly used variant of the backpropagation algorithm in which net(cid:173) work weights are trained by means of a local gradient descent on ex(cid:173) amples drawn randomly from a fixed training set, and the learning rate TJ of the gradient updates is held constant (simple backpropa(cid:173) gation). Using stochastic approximation results, we show that for TJ ~ 0 this training process approaches a batch training and pro(cid:173) vide results on the rate of convergence. Further, we show that for small TJ one can approximate simple back propagation by the sum of a batch training process and a Gaussian diffusion which is the unique solution to a linear stochastic differential equation. Using this approximation we indicate the reasons why simple backprop(cid:173) agation is less likely to get stuck in local minima than the batch training process and demonstrate this empirically on a number of examples.
Cite
Text
Finnoff. "Diffusion Approximations for the Constant Learning Rate Backpropagation Algorithm and Resistence to Local Minima." Neural Information Processing Systems, 1992.Markdown
[Finnoff. "Diffusion Approximations for the Constant Learning Rate Backpropagation Algorithm and Resistence to Local Minima." Neural Information Processing Systems, 1992.](https://mlanthology.org/neurips/1992/finnoff1992neurips-diffusion/)BibTeX
@inproceedings{finnoff1992neurips-diffusion,
title = {{Diffusion Approximations for the Constant Learning Rate Backpropagation Algorithm and Resistence to Local Minima}},
author = {Finnoff, William},
booktitle = {Neural Information Processing Systems},
year = {1992},
pages = {459-466},
url = {https://mlanthology.org/neurips/1992/finnoff1992neurips-diffusion/}
}