Provable Hierarchy-Based Meta-Reinforcement Learning

Abstract

Hierarchical reinforcement learning (HRL) has seen widespread interest as an approach to tractable learning of complex modular behaviors. However, existing works either assume access to expert-constructed hierarchies, or use hierarchy-learning heuristics with no provable guarantees. To address this gap, we analyze HRL in the meta-RL setting, where a learner learns latent hierarchical structure during meta-training for use in a downstream task. We consider a tabular setting where natural hierarchical structure is embedded in the transition dynamics. Analogous to supervised meta-learning theory, we provide diversity conditions which, together with a tractable optimism-based algorithm, guarantee sample-efficient recovery of this natural hierarchy. Furthermore, we provide regret bounds on a learner using the recovered hierarchy to solve a meta-test task. Our bounds incorporate common notions in HRL literature such as temporal and state/action abstractions, suggesting that our setting and analysis capture important features of HRL in practice.

Cite

Text

Chua et al. "Provable Hierarchy-Based Meta-Reinforcement Learning." Artificial Intelligence and Statistics, 2023.

Markdown

[Chua et al. "Provable Hierarchy-Based Meta-Reinforcement Learning." Artificial Intelligence and Statistics, 2023.](https://mlanthology.org/aistats/2023/chua2023aistats-provable/)

BibTeX

@inproceedings{chua2023aistats-provable,
  title     = {{Provable Hierarchy-Based Meta-Reinforcement Learning}},
  author    = {Chua, Kurtland and Lei, Qi and Lee, Jason},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2023},
  pages     = {10918-10967},
  volume    = {206},
  url       = {https://mlanthology.org/aistats/2023/chua2023aistats-provable/}
}