Asymptotics of Gradient-Based Neural Network Training Algorithms

Abstract

We study the asymptotic properties of the sequence of iterates of weight-vector estimates obtained by training a multilayer feed for(cid:173) ward neural network with a basic gradient-descent method using a fixed learning constant and no batch-processing. In the one(cid:173) dimensional case, an exact analysis establishes the existence of a limiting distribution that is not Gaussian in general. For the gen(cid:173) eral case and small learning constant, a linearization approximation permits the application of results from the theory of random ma(cid:173) trices to again establish the existence of a limiting distribution. We study the first few moments of this distribution to compare and contrast the results of our analysis with those of techniques of stochastic approximation.

Cite

Text

Mukherjee and Fine. "Asymptotics of Gradient-Based Neural Network Training Algorithms." Neural Information Processing Systems, 1994.

Markdown

[Mukherjee and Fine. "Asymptotics of Gradient-Based Neural Network Training Algorithms." Neural Information Processing Systems, 1994.](https://mlanthology.org/neurips/1994/mukherjee1994neurips-asymptotics/)

BibTeX

@inproceedings{mukherjee1994neurips-asymptotics,
  title     = {{Asymptotics of Gradient-Based Neural Network Training Algorithms}},
  author    = {Mukherjee, Sayandev and Fine, Terrence L.},
  booktitle = {Neural Information Processing Systems},
  year      = {1994},
  pages     = {335-342},
  url       = {https://mlanthology.org/neurips/1994/mukherjee1994neurips-asymptotics/}
}