Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations

Abstract

We investigate the parameter-space geometry of recurrent neural networks (RNNs), and develop an adaptation of path-SGD optimization method, attuned to this geometry, that can learn plain RNNs with ReLU activations. On several datasets that require capturing long-term dependency structure, we show that path-SGD can significantly improve trainability of ReLU RNNs compared to RNNs trained with SGD, even with various recently suggested initialization schemes.

Cite

Text

Neyshabur et al. "Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations." Neural Information Processing Systems, 2016.

Markdown

[Neyshabur et al. "Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations." Neural Information Processing Systems, 2016.](https://mlanthology.org/neurips/2016/neyshabur2016neurips-pathnormalized/)

BibTeX

@inproceedings{neyshabur2016neurips-pathnormalized,
  title     = {{Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations}},
  author    = {Neyshabur, Behnam and Wu, Yuhuai and Salakhutdinov, Ruslan and Srebro, Nati},
  booktitle = {Neural Information Processing Systems},
  year      = {2016},
  pages     = {3477-3485},
  url       = {https://mlanthology.org/neurips/2016/neyshabur2016neurips-pathnormalized/}
}