adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs

Abstract

Recurrent Neural Networks (RNNs) are powerful models that achieve exceptional performance on several pattern recognition problems. However, the training of RNNs is a computationally difficult task owing to the well-known "vanishing/exploding" gradient problem. Algorithms proposed for training RNNs either exploit no (or limited) curvature information and have cheap per-iteration complexity, or attempt to gain significant curvature information at the cost of increased per-iteration cost. The former set includes diagonally-scaled first-order methods such as ADAGRAD and ADAM, while the latter consists of second-order algorithms like Hessian-Free Newton and K-FAC. In this paper, we present adaQN, a stochastic quasi-Newton algorithm for training RNNs. Our approach retains a low per-iteration cost while allowing for non-diagonal scaling through a stochastic L-BFGS updating scheme. The method uses a novel L-BFGS scaling initialization scheme and is judicious in storing and retaining L-BFGS curvature pairs. We present numerical experiments on two language modeling tasks and show that adaQN is competitive with popular RNN training algorithms.

Cite

Text

Keskar and Berahas. "adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2016. doi:10.1007/978-3-319-46128-1_1

Markdown

[Keskar and Berahas. "adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2016.](https://mlanthology.org/ecmlpkdd/2016/keskar2016ecmlpkdd-adaqn/) doi:10.1007/978-3-319-46128-1_1

BibTeX

@inproceedings{keskar2016ecmlpkdd-adaqn,
  title     = {{adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs}},
  author    = {Keskar, Nitish Shirish and Berahas, Albert S.},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2016},
  pages     = {1-16},
  doi       = {10.1007/978-3-319-46128-1_1},
  url       = {https://mlanthology.org/ecmlpkdd/2016/keskar2016ecmlpkdd-adaqn/}
}