Diffusion Approximations for the Constant Learning Rate Backpropagation Algorithm and Resistence to Local Minima

Abstract

In this paper we discuss the asymptotic properties of the most com(cid:173) monly used variant of the backpropagation algorithm in which net(cid:173) work weights are trained by means of a local gradient descent on ex(cid:173) amples drawn randomly from a fixed training set, and the learning rate TJ of the gradient updates is held constant (simple backpropa(cid:173) gation). Using stochastic approximation results, we show that for TJ ~ 0 this training process approaches a batch training and pro(cid:173) vide results on the rate of convergence. Further, we show that for small TJ one can approximate simple back propagation by the sum of a batch training process and a Gaussian diffusion which is the unique solution to a linear stochastic differential equation. Using this approximation we indicate the reasons why simple backprop(cid:173) agation is less likely to get stuck in local minima than the batch training process and demonstrate this empirically on a number of examples.

Cite

Text

Finnoff. "Diffusion Approximations for the Constant Learning Rate Backpropagation Algorithm and Resistence to Local Minima." Neural Information Processing Systems, 1992.

Markdown

[Finnoff. "Diffusion Approximations for the Constant Learning Rate Backpropagation Algorithm and Resistence to Local Minima." Neural Information Processing Systems, 1992.](https://mlanthology.org/neurips/1992/finnoff1992neurips-diffusion/)

BibTeX

@inproceedings{finnoff1992neurips-diffusion,
  title     = {{Diffusion Approximations for the Constant Learning Rate Backpropagation Algorithm and Resistence to Local Minima}},
  author    = {Finnoff, William},
  booktitle = {Neural Information Processing Systems},
  year      = {1992},
  pages     = {459-466},
  url       = {https://mlanthology.org/neurips/1992/finnoff1992neurips-diffusion/}
}