Provable Reinforcement Learning with a Short-Term Memory

Abstract

Real-world sequential decision making problems commonly involve partial observability, which requires the agent to maintain a memory of history in order to infer the latent states, plan and make good decisions. Coping with partial observability in general is extremely challenging, as a number of worst-case statistical and computational barriers are known in learning Partially Observable Markov Decision Processes (POMDPs). Motivated by the problem structure in several physical applications, as well as a commonly used technique known as "frame stacking", this paper proposes to study a new subclass of POMDPs, whose latent states can be decoded by the most recent history of a short length m. We establish a set of upper and lower bounds on the sample complexity for learning near-optimal policies for this class of problems in both tabular and rich-observation settings (where the number of observations is enormous). In particular, in the rich-observation setting, we develop new algorithms using a novel "moment matching" approach with a sample complexity that scales exponentially with the short length m rather than the problem horizon, and is independent of the number of observations. Our results show that a short-term memory suffices for reinforcement learning in these environments.

Cite

Text

Efroni et al. "Provable Reinforcement Learning with a Short-Term Memory." International Conference on Machine Learning, 2022.

Markdown

[Efroni et al. "Provable Reinforcement Learning with a Short-Term Memory." International Conference on Machine Learning, 2022.](https://mlanthology.org/icml/2022/efroni2022icml-provable/)

BibTeX

@inproceedings{efroni2022icml-provable,
  title     = {{Provable Reinforcement Learning with a Short-Term Memory}},
  author    = {Efroni, Yonathan and Jin, Chi and Krishnamurthy, Akshay and Miryoosefi, Sobhan},
  booktitle = {International Conference on Machine Learning},
  year      = {2022},
  pages     = {5832-5850},
  volume    = {162},
  url       = {https://mlanthology.org/icml/2022/efroni2022icml-provable/}
}