adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs
Abstract
Recurrent Neural Networks (RNNs) are powerful models that achieve exceptional performance on several pattern recognition problems. However, the training of RNNs is a computationally difficult task owing to the well-known "vanishing/exploding" gradient problem. Algorithms proposed for training RNNs either exploit no (or limited) curvature information and have cheap per-iteration complexity, or attempt to gain significant curvature information at the cost of increased per-iteration cost. The former set includes diagonally-scaled first-order methods such as ADAGRAD and ADAM, while the latter consists of second-order algorithms like Hessian-Free Newton and K-FAC. In this paper, we present adaQN, a stochastic quasi-Newton algorithm for training RNNs. Our approach retains a low per-iteration cost while allowing for non-diagonal scaling through a stochastic L-BFGS updating scheme. The method uses a novel L-BFGS scaling initialization scheme and is judicious in storing and retaining L-BFGS curvature pairs. We present numerical experiments on two language modeling tasks and show that adaQN is competitive with popular RNN training algorithms.
Cite
Text
Keskar and Berahas. "adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2016. doi:10.1007/978-3-319-46128-1_1Markdown
[Keskar and Berahas. "adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2016.](https://mlanthology.org/ecmlpkdd/2016/keskar2016ecmlpkdd-adaqn/) doi:10.1007/978-3-319-46128-1_1BibTeX
@inproceedings{keskar2016ecmlpkdd-adaqn,
title = {{adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs}},
author = {Keskar, Nitish Shirish and Berahas, Albert S.},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2016},
pages = {1-16},
doi = {10.1007/978-3-319-46128-1_1},
url = {https://mlanthology.org/ecmlpkdd/2016/keskar2016ecmlpkdd-adaqn/}
}