Diagrammatic Derivation of Gradient Algorithms for Neural Networks

Abstract

Deriving gradient algorithms for time-dependent neural network structures typically requires numerous chain rule expansions, diligent bookkeeping, and careful manipulation of terms. In this paper, we show how to derive such algorithms via a set of simple block diagram manipulation rules. The approach provides a common framework to derive popular algorithms including backpropagation and backpropagation-through-time without a single chain rule expansion. Additional examples are provided for a variety of complicated architectures to illustrate both the generality and the simplicity of the approach.

Cite

Text

Wan and Beaufays. "Diagrammatic Derivation of Gradient Algorithms for Neural Networks." Neural Computation, 1996. doi:10.1162/NECO.1996.8.1.182

Markdown

[Wan and Beaufays. "Diagrammatic Derivation of Gradient Algorithms for Neural Networks." Neural Computation, 1996.](https://mlanthology.org/neco/1996/wan1996neco-diagrammatic/) doi:10.1162/NECO.1996.8.1.182

BibTeX

@article{wan1996neco-diagrammatic,
  title     = {{Diagrammatic Derivation of Gradient Algorithms for Neural Networks}},
  author    = {Wan, Eric A. and Beaufays, Françoise},
  journal   = {Neural Computation},
  year      = {1996},
  pages     = {182-201},
  doi       = {10.1162/NECO.1996.8.1.182},
  volume    = {8},
  url       = {https://mlanthology.org/neco/1996/wan1996neco-diagrammatic/}
}