Learning Hierarchical Planning-Based Policies from Offline Data
Abstract
Hierarchical policy architectures incorporating some planning component into the top-level have shown superior performance and generalization in agent navigation tasks. Cost or safety reasons may, however, prevent training in an online (RL) fashion with continuous environment interaction. We therefore propose HORIBLe-VRN, an algorithm to learn a hierarchical policy with a top-level planning-based module from pre-collected data. A key challenge is to deal with the unknown, latent high-level (HL) actions. Our algorithm features an EM-style hierarchical imitation learning stage, incorporating HL action inference, and a subsequent offline RL refinement stage for the top-level policy. We empirically evaluate HORIBLe-VRN in a long horizon, sparse reward agent navigation task, investigating performance, generalization capabilities, and robustness with respect to sub-optimal demonstration data.
Cite
Text
Wöhlke et al. "Learning Hierarchical Planning-Based Policies from Offline Data." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023. doi:10.1007/978-3-031-43421-1_29Markdown
[Wöhlke et al. "Learning Hierarchical Planning-Based Policies from Offline Data." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023.](https://mlanthology.org/ecmlpkdd/2023/wohlke2023ecmlpkdd-learning/) doi:10.1007/978-3-031-43421-1_29BibTeX
@inproceedings{wohlke2023ecmlpkdd-learning,
title = {{Learning Hierarchical Planning-Based Policies from Offline Data}},
author = {Wöhlke, Jan and Schmitt, Felix and van Hoof, Herke},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2023},
pages = {489-505},
doi = {10.1007/978-3-031-43421-1_29},
url = {https://mlanthology.org/ecmlpkdd/2023/wohlke2023ecmlpkdd-learning/}
}