Subwords as Skills: Tokenization for Sparse-Reward Reinforcement Learning

Abstract

Exploration in sparse-reward reinforcement learning (RL) is difficult due to the need for long, coordinated sequences of actions in order to achieve any reward. Skill learning, from demonstrations or interaction, is a promising approach to address this, but skill extraction and inference are expensive for current methods. We present a novel method to extract skills from demonstrations for use in sparse-reward RL, inspired by the popular Byte-Pair Encoding (BPE) algorithm in natural language processing. With these skills, we show strong performance in a variety of tasks, 1000$\times$ acceleration for skill-extraction and 100$\times$ acceleration for policy inference. Given the simplicity of our method, skills extracted from 1\% of the demonstrations in one task can be transferred to a new loosely related task. We also note that such a method yields a finite set of interpretable behaviors. Our code is available at https://github.com/dyunis/subwords_as_skills.

Cite

Text

Yunis et al. "Subwords as Skills: Tokenization for Sparse-Reward Reinforcement Learning." Neural Information Processing Systems, 2024. doi:10.52202/079017-2161

Markdown

[Yunis et al. "Subwords as Skills: Tokenization for Sparse-Reward Reinforcement Learning." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/yunis2024neurips-subwords/) doi:10.52202/079017-2161

BibTeX

@inproceedings{yunis2024neurips-subwords,
  title     = {{Subwords as Skills: Tokenization for Sparse-Reward Reinforcement Learning}},
  author    = {Yunis, David and Jung, Justin and Dai, Falcon Z. and Walter, Matthew R.},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-2161},
  url       = {https://mlanthology.org/neurips/2024/yunis2024neurips-subwords/}
}