Goal-HSVI: Heuristic Search Value Iteration for Goal POMDPs

Abstract

Partially observable Markov decision processes (POMDPs) are the standard models for planning under uncertainty with both finite and infinite horizon. Besides the well-known discounted-sum objective, indefinite-horizon objective (aka Goal-POMDPs) is another classical objective for POMDPs. In this case, given a set of target states and a positive cost for each transition, the optimization objective is to minimize the expected total cost until a target state is reached. In the literature, RTDP-Bel or heuristic search value iteration (HSVI) have been used for solving Goal-POMDPs. Neither of these algorithms has theoretical convergence guarantees, and HSVI may even fail to terminate its trials. We give the following contributions: (1) We discuss the challenges introduced in Goal-POMDPs and illustrate how they prevent the original HSVI from converging. (2) We present a novel algorithm inspired by HSVI, termed Goal-HSVI, and show that our algorithm has convergence guarantees. (3) We show that Goal-HSVI outperforms RTDP-Bel on a set of well-known examples.

Cite

Text

Horák et al. "Goal-HSVI: Heuristic Search Value Iteration for Goal POMDPs." International Joint Conference on Artificial Intelligence, 2018. doi:10.24963/IJCAI.2018/662

Markdown

[Horák et al. "Goal-HSVI: Heuristic Search Value Iteration for Goal POMDPs." International Joint Conference on Artificial Intelligence, 2018.](https://mlanthology.org/ijcai/2018/horak2018ijcai-goal/) doi:10.24963/IJCAI.2018/662

BibTeX

@inproceedings{horak2018ijcai-goal,
  title     = {{Goal-HSVI: Heuristic Search Value Iteration for Goal POMDPs}},
  author    = {Horák, Karel and Bosanský, Branislav and Chatterjee, Krishnendu},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2018},
  pages     = {4764-4770},
  doi       = {10.24963/IJCAI.2018/662},
  url       = {https://mlanthology.org/ijcai/2018/horak2018ijcai-goal/}
}