SBO-RNN: Reformulating Recurrent Neural Networks via Stochastic Bilevel Optimization
Abstract
In this paper we consider the training stability of recurrent neural networks (RNNs) and propose a family of RNNs, namely SBO-RNN, that can be formulated using stochastic bilevel optimization (SBO). With the help of stochastic gradient descent (SGD), we manage to convert the SBO problem into an RNN where the feedforward and backpropagation solve the lower and upper-level optimization for learning hidden states and their hyperparameters, respectively. We prove that under mild conditions there is no vanishing or exploding gradient in training SBO-RNN. Empirically we demonstrate our approach with superior performance on several benchmark datasets, with fewer parameters, less training data, and much faster convergence. Code is available at https://zhang-vislab.github.io.
Cite
Text
Zhang et al. "SBO-RNN: Reformulating Recurrent Neural Networks via Stochastic Bilevel Optimization." Neural Information Processing Systems, 2021.Markdown
[Zhang et al. "SBO-RNN: Reformulating Recurrent Neural Networks via Stochastic Bilevel Optimization." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/zhang2021neurips-sbornn/)BibTeX
@inproceedings{zhang2021neurips-sbornn,
title = {{SBO-RNN: Reformulating Recurrent Neural Networks via Stochastic Bilevel Optimization}},
author = {Zhang, Ziming and Yue, Yun and Wu, Guojun and Li, Yanhua and Zhang, Haichong},
booktitle = {Neural Information Processing Systems},
year = {2021},
url = {https://mlanthology.org/neurips/2021/zhang2021neurips-sbornn/}
}