Memory-Efficient Backpropagation Through Time

Abstract

We propose a novel approach to reduce memory consumption of the backpropagation through time (BPTT) algorithm when training recurrent neural networks (RNNs). Our approach uses dynamic programming to balance a trade-off between caching of intermediate results and recomputation. The algorithm is capable of tightly fitting within almost any user-set memory budget while finding an optimal execution policy minimizing the computational cost. Computational devices have limited memory capacity and maximizing a computational performance given a fixed memory budget is a practical use-case. We provide asymptotic computational upper bounds for various regimes. The algorithm is particularly effective for long sequences. For sequences of length 1000, our algorithm saves 95\% of memory usage while using only one third more time per iteration than the standard BPTT.

Cite

Text

Gruslys et al. "Memory-Efficient Backpropagation Through Time." Neural Information Processing Systems, 2016.

Markdown

[Gruslys et al. "Memory-Efficient Backpropagation Through Time." Neural Information Processing Systems, 2016.](https://mlanthology.org/neurips/2016/gruslys2016neurips-memoryefficient/)

BibTeX

@inproceedings{gruslys2016neurips-memoryefficient,
  title     = {{Memory-Efficient Backpropagation Through Time}},
  author    = {Gruslys, Audrunas and Munos, Remi and Danihelka, Ivo and Lanctot, Marc and Graves, Alex},
  booktitle = {Neural Information Processing Systems},
  year      = {2016},
  pages     = {4125-4133},
  url       = {https://mlanthology.org/neurips/2016/gruslys2016neurips-memoryefficient/}
}