Non Vanishing Gradients for Arbitrarily Deep Neural Networks: A Hamiltonian System Approach

Abstract

Deep Neural Networks (DNNs) training can be difficult due to vanishing or exploding gradients during weight optimization through backpropagation. To address this problem, we propose a general class of Hamiltonian DNNs (H-DNNs) that stems from the discretization of continuous-time Hamiltonian systems. Our main result is that a broad set of H-DNNs ensures non-vanishing gradients by design for an arbitrary network depth. This is obtained by proving that, using a semi-implicit Euler discretization scheme, the backward sensitivity matrices involved in gradient computations are symplectic.

Cite

Text

Galimberti et al. "Non Vanishing Gradients for Arbitrarily Deep Neural Networks: A Hamiltonian System Approach." NeurIPS 2021 Workshops: DLDE, 2021.

Markdown

[Galimberti et al. "Non Vanishing Gradients for Arbitrarily Deep Neural Networks: A Hamiltonian System Approach." NeurIPS 2021 Workshops: DLDE, 2021.](https://mlanthology.org/neuripsw/2021/galimberti2021neuripsw-non/)

BibTeX

@inproceedings{galimberti2021neuripsw-non,
  title     = {{Non Vanishing Gradients for Arbitrarily Deep Neural Networks: A Hamiltonian System Approach}},
  author    = {Galimberti, Clara and Furieri, Luca and Xu, Liang and Ferrari-Trecate, Giancarlo},
  booktitle = {NeurIPS 2021 Workshops: DLDE},
  year      = {2021},
  url       = {https://mlanthology.org/neuripsw/2021/galimberti2021neuripsw-non/}
}