Learning Recurrent Neural Networks with Hessian-Free Optimization
Abstract
In this work we resolve the long-outstanding problem of how to effectively train recurrent neural networks (RNNs) on complex and difficult sequence modeling problems which may contain long-term data dependencies. Utilizing recent advances in the Hessian-free optimization approach \citep{hf}, together with a novel damping scheme, we successfully train RNNs on two sets of challenging problems. First, a collection of pathological synthetic datasets which are known to be impossible for standard optimization approaches (due to their extremely long-term dependencies), and second, on three natural and highly complex real-world sequence datasets where we find that our method significantly outperforms the previous state-of-the-art method for training neural sequence models: the Long Short-term Memory approach of \citet{lstm}. Additionally, we offer a new interpretation of the generalized Gauss-Newton matrix of \citet{schraudolph} which is used within the HF approach of Martens.
Cite
Text
Martens and Sutskever. "Learning Recurrent Neural Networks with Hessian-Free Optimization." International Conference on Machine Learning, 2011.Markdown
[Martens and Sutskever. "Learning Recurrent Neural Networks with Hessian-Free Optimization." International Conference on Machine Learning, 2011.](https://mlanthology.org/icml/2011/martens2011icml-learning/)BibTeX
@inproceedings{martens2011icml-learning,
title = {{Learning Recurrent Neural Networks with Hessian-Free Optimization}},
author = {Martens, James and Sutskever, Ilya},
booktitle = {International Conference on Machine Learning},
year = {2011},
pages = {1033-1040},
url = {https://mlanthology.org/icml/2011/martens2011icml-learning/}
}