Long Expressive Memory for Sequence Modeling

Abstract

We propose a novel method called Long Expressive Memory (LEM) for learning long-term sequential dependencies. LEM is gradient-based, it can efficiently process sequential tasks with very long-term dependencies, and it is sufficiently expressive to be able to learn complicated input-output maps. To derive LEM, we consider a system of multiscale ordinary differential equations, as well as a suitable time-discretization of this system. For LEM, we derive rigorous bounds to show the mitigation of the exploding and vanishing gradients problem, a well-known challenge for gradient-based recurrent sequential learning methods. We also prove that LEM can approximate a large class of dynamical systems to high accuracy. Our empirical results, ranging from image and time-series classification through dynamical systems prediction to speech recognition and language modeling, demonstrate that LEM outperforms state-of-the-art recurrent neural networks, gated recurrent units, and long short-term memory models.

Cite

Text

Rusch et al. "Long Expressive Memory for Sequence Modeling." International Conference on Learning Representations, 2022.

Markdown

[Rusch et al. "Long Expressive Memory for Sequence Modeling." International Conference on Learning Representations, 2022.](https://mlanthology.org/iclr/2022/rusch2022iclr-long/)

BibTeX

@inproceedings{rusch2022iclr-long,
  title     = {{Long Expressive Memory for Sequence Modeling}},
  author    = {Rusch, T. Konstantin and Mishra, Siddhartha and Erichson, N. Benjamin and Mahoney, Michael W.},
  booktitle = {International Conference on Learning Representations},
  year      = {2022},
  url       = {https://mlanthology.org/iclr/2022/rusch2022iclr-long/}
}