Analyzing and Improving Surrogate Gradient Training in Binary Neural Networks Using Dynamical Systems Theory

Abstract

Training Binary Recurrent Networks on tasks that span long time horizons is challenging, as the discrete activation function renders the error landscape non-differentiable. Surrogate gradient training replaces the discrete activation function with a differentiable one in the backward pass but still suffers from exploding and vanishing gradients. We leverage the connection between gradient stability and Lyapunov exponents to address this issue from a dynamical systems perspective, extending our previous work on Lyapunov exponent regularization to non-differentiable systems. We use differentiable linear algebra to regularize surrogate Lyapunov exponents, a method we call surrogate gradient flossing. We show that surrogate gradient flossing enhances performance on temporally demanding tasks.

Cite

Text

Engelken and Abbott. "Analyzing and Improving Surrogate Gradient Training in Binary Neural Networks Using Dynamical Systems Theory." ICML 2024 Workshops: Differentiable_Almost_Everything, 2024.

Markdown

[Engelken and Abbott. "Analyzing and Improving Surrogate Gradient Training in Binary Neural Networks Using Dynamical Systems Theory." ICML 2024 Workshops: Differentiable_Almost_Everything, 2024.](https://mlanthology.org/icmlw/2024/engelken2024icmlw-analyzing/)

BibTeX

@inproceedings{engelken2024icmlw-analyzing,
  title     = {{Analyzing and Improving Surrogate Gradient Training in Binary Neural Networks Using Dynamical Systems Theory}},
  author    = {Engelken, Rainer and Abbott, Larry},
  booktitle = {ICML 2024 Workshops: Differentiable_Almost_Everything},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/engelken2024icmlw-analyzing/}
}