The Efficiency and the Robustness of Natural Gradient Descent Learning Rule
Abstract
The inverse of the Fisher information matrix is used in the natu(cid:173) ral gradient descent algorithm to train single-layer and multi-layer perceptrons. We have discovered a new scheme to represent the Fisher information matrix of a stochastic multi-layer perceptron. Based on this scheme, we have designed an algorithm to compute the natural gradient. When the input dimension n is much larger than the number of hidden neurons, the complexity of this algo(cid:173) rithm is of order O(n). It is confirmed by simulations that the natural gradient descent learning rule is not only efficient but also robust.
Cite
Text
Yang and Amari. "The Efficiency and the Robustness of Natural Gradient Descent Learning Rule." Neural Information Processing Systems, 1997.Markdown
[Yang and Amari. "The Efficiency and the Robustness of Natural Gradient Descent Learning Rule." Neural Information Processing Systems, 1997.](https://mlanthology.org/neurips/1997/yang1997neurips-efficiency/)BibTeX
@inproceedings{yang1997neurips-efficiency,
title = {{The Efficiency and the Robustness of Natural Gradient Descent Learning Rule}},
author = {Yang, Howard Hua and Amari, Shun-ichi},
booktitle = {Neural Information Processing Systems},
year = {1997},
pages = {385-391},
url = {https://mlanthology.org/neurips/1997/yang1997neurips-efficiency/}
}