Reinforcement Learning as One Big Sequence Modeling Problem

Michael Janner, Qiyang Li, Sergey Levine

ICMLW 2021

/icmlw/2021/janner2021icmlw-reinforcement/

Abstract

Reinforcement learning (RL) is typically concerned with estimating single-step policies or single-step models, leveraging the Markov property to factorize the problem in time. However, we can also view RL as a sequence modeling problem, with the goal being to predict a sequence of actions that leads to a sequence of high rewards. Viewed in this way, it is tempting to consider whether powerful, high-capacity sequence prediction models that work well in other domains, such as natural-language processing, can also provide simple and effective solutions to the RL problem. To this end, we explore how RL can be reframed as ``one big sequence modeling'' problem, using state-of-the-art Transformer architectures to model distributions over sequences of states, actions, and rewards. Addressing RL as a sequence modeling problem significantly simplifies a range of design decisions: we no longer require separate behavior policy constraints, as is common in prior work on offline model-free RL, and we no longer require ensembles or other epistemic uncertainty estimators, as is common in prior work on model-based RL. All of these roles are filled by the same Transformer sequence model. In our experiments, we demonstrate the flexibility of this approach across long-horizon dynamics prediction, imitation learning, goal-conditioned RL, and offline RL.

PDF ICMLW OpenReview Semantic Scholar

Cite

Text

Janner et al. "Reinforcement Learning as One Big Sequence Modeling Problem." ICML 2021 Workshops: URL, 2021.

Markdown

[Janner et al. "Reinforcement Learning as One Big Sequence Modeling Problem." ICML 2021 Workshops: URL, 2021.](https://mlanthology.org/icmlw/2021/janner2021icmlw-reinforcement/)

BibTeX

@inproceedings{janner2021icmlw-reinforcement,
  title     = {{Reinforcement Learning as One Big Sequence Modeling Problem}},
  author    = {Janner, Michael and Li, Qiyang and Levine, Sergey},
  booktitle = {ICML 2021 Workshops: URL},
  year      = {2021},
  url       = {https://mlanthology.org/icmlw/2021/janner2021icmlw-reinforcement/}
}