Partially Observable Hierarchical Reinforcement Learning with AI Planning (Student Abstract)

Abstract

Partially observable Markov decision processes (POMDPs) challenge reinforcement learning agents due to incomplete knowledge of the environment. Even assuming monotonicity in uncertainty, it is difficult for an agent to know how and when to stop exploring for a given task. In this abstract, we discuss how to use hierarchical reinforcement learning (HRL) and AI Planning (AIP) to improve exploration when the agent knows possible valuations of unknown predicates and how to discover them. By encoding the uncertainty in an abstract planning model, the agent can derive a high-level plan which is then used to decompose the overall POMDP into a tree of semi-POMDPs for training. We evaluate our agent's performance on the MiniGrid domain and show how guided exploration may improve agent performance.

Cite

Text

Rozek et al. "Partially Observable Hierarchical Reinforcement Learning with AI Planning (Student Abstract)." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I21.30504

Markdown

[Rozek et al. "Partially Observable Hierarchical Reinforcement Learning with AI Planning (Student Abstract)." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/rozek2024aaai-partially/) doi:10.1609/AAAI.V38I21.30504

BibTeX

@inproceedings{rozek2024aaai-partially,
  title     = {{Partially Observable Hierarchical Reinforcement Learning with AI Planning (Student Abstract)}},
  author    = {Rozek, Brandon and Lee, Junkyu and Kokel, Harsha and Katz, Michael and Sohrabi, Shirin},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {23635-23636},
  doi       = {10.1609/AAAI.V38I21.30504},
  url       = {https://mlanthology.org/aaai/2024/rozek2024aaai-partially/}
}