Learning Recurrent Neural Networks with Hessian-Free Optimization

Abstract

In this work we resolve the long-outstanding problem of how to effectively train recurrent neural networks (RNNs) on complex and difficult sequence modeling problems which may contain long-term data dependencies. Utilizing recent advances in the Hessian-free optimization approach \citep{hf}, together with a novel damping scheme, we successfully train RNNs on two sets of challenging problems. First, a collection of pathological synthetic datasets which are known to be impossible for standard optimization approaches (due to their extremely long-term dependencies), and second, on three natural and highly complex real-world sequence datasets where we find that our method significantly outperforms the previous state-of-the-art method for training neural sequence models: the Long Short-term Memory approach of \citet{lstm}. Additionally, we offer a new interpretation of the generalized Gauss-Newton matrix of \citet{schraudolph} which is used within the HF approach of Martens.

Cite

Text

Martens and Sutskever. "Learning Recurrent Neural Networks with Hessian-Free Optimization." International Conference on Machine Learning, 2011.

Markdown

[Martens and Sutskever. "Learning Recurrent Neural Networks with Hessian-Free Optimization." International Conference on Machine Learning, 2011.](https://mlanthology.org/icml/2011/martens2011icml-learning/)

BibTeX

@inproceedings{martens2011icml-learning,
  title     = {{Learning Recurrent Neural Networks with Hessian-Free Optimization}},
  author    = {Martens, James and Sutskever, Ilya},
  booktitle = {International Conference on Machine Learning},
  year      = {2011},
  pages     = {1033-1040},
  url       = {https://mlanthology.org/icml/2011/martens2011icml-learning/}
}