StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning

Abstract

Reinforcement Learning (RL) can be considered as a sequence modeling task: given a sequence of past state-action-reward experiences, an agent predicts a sequence of next actions. In this work, we propose State-Action-Reward Transformer (StARformer) for visual RL, which explicitly models short-term state-action-reward representations (StAR-representations), essentially introducing a Markovian-like inductive bias to improve long-term modeling. Our approach first extracts StAR-representations by self-attending image state patches, action, and reward tokens within a short temporal window. These are then combined with pure image state representations --- extracted as convolutional features, to perform self-attention over the whole sequence. Our experiments show that StARformer outperforms the state-of-the-art Transformer-based method on image-based Atari and DeepMind Control Suite benchmarks, in both offline-RL and imitation learning settings. StARformer is also more compliant with longer sequences of inputs. Our code is available at https://github.com/elicassion/StARformer.

Cite

Text

Shang et al. "StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning." Proceedings of the European Conference on Computer Vision (ECCV), 2022. doi:10.1007/978-3-031-19842-7_27

Markdown

[Shang et al. "StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning." Proceedings of the European Conference on Computer Vision (ECCV), 2022.](https://mlanthology.org/eccv/2022/shang2022eccv-starformer/) doi:10.1007/978-3-031-19842-7_27

BibTeX

@inproceedings{shang2022eccv-starformer,
  title     = {{StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning}},
  author    = {Shang, Jinghuan and Kahatapitiya, Kumara and Li, Xiang and Ryoo, Michael S.},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2022},
  doi       = {10.1007/978-3-031-19842-7_27},
  url       = {https://mlanthology.org/eccv/2022/shang2022eccv-starformer/}
}