Solving Partially Observable Stochastic Shortest-Path Games

Abstract

We study the two-player zero-sum extension of the partially observable stochastic shortest-path problem where one agent has only partial information about the environment. We formulate this problem as a partially observable stochastic game (POSG): given a set of target states and negative rewards for each transition, the player with imperfect information maximizes the expected undiscounted total reward until a target state is reached. The second player with the perfect information aims for the opposite. We base our formalism on POSGs with one-sided observability (OS-POSGs) and give the following contributions: (1) we introduce a novel heuristic search value iteration algorithm that iteratively solves depth-limited variants of the game, (2) we derive the bound on the depth guaranteeing an arbitrary precision, (3) we propose a novel upper-bound estimation that allows early terminations, and (4) we experimentally evaluate the algorithm on a pursuit-evasion game.

Cite

Text

Tomásek et al. "Solving Partially Observable Stochastic Shortest-Path Games." International Joint Conference on Artificial Intelligence, 2021. doi:10.24963/IJCAI.2021/575

Markdown

[Tomásek et al. "Solving Partially Observable Stochastic Shortest-Path Games." International Joint Conference on Artificial Intelligence, 2021.](https://mlanthology.org/ijcai/2021/tomasek2021ijcai-solving/) doi:10.24963/IJCAI.2021/575

BibTeX

@inproceedings{tomasek2021ijcai-solving,
  title     = {{Solving Partially Observable Stochastic Shortest-Path Games}},
  author    = {Tomásek, Petr and Horák, Karel and Aradhye, Aditya and Bosanský, Branislav and Chatterjee, Krishnendu},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2021},
  pages     = {4182-4189},
  doi       = {10.24963/IJCAI.2021/575},
  url       = {https://mlanthology.org/ijcai/2021/tomasek2021ijcai-solving/}
}