Reinforcement Learning as One Big Sequence Modeling Problem
Abstract
Reinforcement learning (RL) is typically concerned with estimating single-step policies or single-step models, leveraging the Markov property to factorize the problem in time. However, we can also view RL as a sequence modeling problem, with the goal being to predict a sequence of actions that leads to a sequence of high rewards. Viewed in this way, it is tempting to consider whether powerful, high-capacity sequence prediction models that work well in other domains, such as natural-language processing, can also provide simple and effective solutions to the RL problem. To this end, we explore how RL can be reframed as ``one big sequence modeling'' problem, using state-of-the-art Transformer architectures to model distributions over sequences of states, actions, and rewards. Addressing RL as a sequence modeling problem significantly simplifies a range of design decisions: we no longer require separate behavior policy constraints, as is common in prior work on offline model-free RL, and we no longer require ensembles or other epistemic uncertainty estimators, as is common in prior work on model-based RL. All of these roles are filled by the same Transformer sequence model. In our experiments, we demonstrate the flexibility of this approach across long-horizon dynamics prediction, imitation learning, goal-conditioned RL, and offline RL.
Cite
Text
Janner et al. "Reinforcement Learning as One Big Sequence Modeling Problem." ICML 2021 Workshops: URL, 2021.Markdown
[Janner et al. "Reinforcement Learning as One Big Sequence Modeling Problem." ICML 2021 Workshops: URL, 2021.](https://mlanthology.org/icmlw/2021/janner2021icmlw-reinforcement/)BibTeX
@inproceedings{janner2021icmlw-reinforcement,
title = {{Reinforcement Learning as One Big Sequence Modeling Problem}},
author = {Janner, Michael and Li, Qiyang and Levine, Sergey},
booktitle = {ICML 2021 Workshops: URL},
year = {2021},
url = {https://mlanthology.org/icmlw/2021/janner2021icmlw-reinforcement/}
}