Learning Hierarchical Planning-Based Policies from Offline Data

Abstract

Hierarchical policy architectures incorporating some planning component into the top-level have shown superior performance and generalization in agent navigation tasks. Cost or safety reasons may, however, prevent training in an online (RL) fashion with continuous environment interaction. We therefore propose HORIBLe-VRN, an algorithm to learn a hierarchical policy with a top-level planning-based module from pre-collected data. A key challenge is to deal with the unknown, latent high-level (HL) actions. Our algorithm features an EM-style hierarchical imitation learning stage, incorporating HL action inference, and a subsequent offline RL refinement stage for the top-level policy. We empirically evaluate HORIBLe-VRN in a long horizon, sparse reward agent navigation task, investigating performance, generalization capabilities, and robustness with respect to sub-optimal demonstration data.

Cite

Text

Wöhlke et al. "Learning Hierarchical Planning-Based Policies from Offline Data." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023. doi:10.1007/978-3-031-43421-1_29

Markdown

[Wöhlke et al. "Learning Hierarchical Planning-Based Policies from Offline Data." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023.](https://mlanthology.org/ecmlpkdd/2023/wohlke2023ecmlpkdd-learning/) doi:10.1007/978-3-031-43421-1_29

BibTeX

@inproceedings{wohlke2023ecmlpkdd-learning,
  title     = {{Learning Hierarchical Planning-Based Policies from Offline Data}},
  author    = {Wöhlke, Jan and Schmitt, Felix and van Hoof, Herke},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2023},
  pages     = {489-505},
  doi       = {10.1007/978-3-031-43421-1_29},
  url       = {https://mlanthology.org/ecmlpkdd/2023/wohlke2023ecmlpkdd-learning/}
}