Subwords as Skills: Tokenization for Sparse-Reward Reinforcement Learning

Abstract

Exploration in sparse-reward reinforcement learning (RL) is difficult due to the need for long, coordinated sequences of actions in order to achieve any reward. Moreover, in continuous action spaces there are an infinite number of possible actions, which only increases the difficulty of exploration. One class of methods designed to address these issues forms temporally extended actions, often called skills, from interaction data collected in the same domain, and optimizes a policy on top of this new action space. Such methods require a lengthy pretraining phase in order to form the skills before reinforcement learning can begin. Given prior evidence that the full range of the continuous action space is not required in such tasks, we propose a novel approach to skill-generation with two components. First we discretize the action space through clustering, and second we leverage a tokenization technique borrowed from natural language processing to generate temporally extended actions. Using this as an action-space for RL outperforms comparable skill-based approaches in several challenging sparse-reward domains, and requires orders-of-magnitude less computation.

Cite

Text

Yunis et al. "Subwords as Skills: Tokenization for Sparse-Reward Reinforcement Learning." NeurIPS 2023 Workshops: GenPlan, 2023.

Markdown

[Yunis et al. "Subwords as Skills: Tokenization for Sparse-Reward Reinforcement Learning." NeurIPS 2023 Workshops: GenPlan, 2023.](https://mlanthology.org/neuripsw/2023/yunis2023neuripsw-subwords/)

BibTeX

@inproceedings{yunis2023neuripsw-subwords,
  title     = {{Subwords as Skills: Tokenization for Sparse-Reward Reinforcement Learning}},
  author    = {Yunis, David and Jung, Justin and Dai, Falcon and Walter, Matthew},
  booktitle = {NeurIPS 2023 Workshops: GenPlan},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/yunis2023neuripsw-subwords/}
}