Augmenting Policy Learning with Routines Discovered from a Single Demonstration

Zhao, Zelin; Gan, Chuang; Wu, Jiajun; Guo, Xiaoxiao; Tenenbaum, Joshua B.

doi:10.1609/AAAI.V35I12.17316

Augmenting Policy Learning with Routines Discovered from a Single Demonstration

Zelin Zhao, Chuang Gan, Jiajun Wu, Xiaoxiao Guo, Joshua B. Tenenbaum

AAAI 2021 pp. 11024-11032

doi:10.1609/AAAI.V35I12.17316 /aaai/2021/zhao2021aaai-augmenting/

Abstract

Humans can abstract prior knowledge from very little data and use it to boost skill learning. In this paper, we propose routine-augmented policy learning (RAPL), which discovers routines composed of primitive actions from a single demonstration and uses discovered routines to augment policy learning. To discover routines from the demonstration, we first abstract routine candidates by identifying grammar over the demonstrated action trajectory. Then, the best routines measured by length and frequency are selected to form a routine library. We propose to learn policy simultaneously at primitive-level and routine-level with discovered routines, leveraging the temporal structure of routines. Our approach enables imitating expert behavior at multiple temporal scales for imitation learning and promotes reinforcement learning exploration. Extensive experiments on Atari games demonstrate that RAPL improves the state-of-the-art imitation learning method SQIL and reinforcement learning method A2C. Further, we show that discovered routines can generalize to unseen levels and difficulties on the CoinRun benchmark.

PDF AAAI Semantic Scholar

Cite

Text

Zhao et al. "Augmenting Policy Learning with Routines Discovered from a Single Demonstration." AAAI Conference on Artificial Intelligence, 2021. doi:10.1609/AAAI.V35I12.17316

Markdown

[Zhao et al. "Augmenting Policy Learning with Routines Discovered from a Single Demonstration." AAAI Conference on Artificial Intelligence, 2021.](https://mlanthology.org/aaai/2021/zhao2021aaai-augmenting/) doi:10.1609/AAAI.V35I12.17316

BibTeX

@inproceedings{zhao2021aaai-augmenting,
  title     = {{Augmenting Policy Learning with Routines Discovered from a Single Demonstration}},
  author    = {Zhao, Zelin and Gan, Chuang and Wu, Jiajun and Guo, Xiaoxiao and Tenenbaum, Joshua B.},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2021},
  pages     = {11024-11032},
  doi       = {10.1609/AAAI.V35I12.17316},
  url       = {https://mlanthology.org/aaai/2021/zhao2021aaai-augmenting/}
}