Approximation Algorithms for Observer Aware MDPs

Abstract

We present approximation algorithms for Observer-Aware Markov Decision Processes (OAMDPs). OAMDPs model sequential decision-making problems in which rewards depend on the beliefs of an observer about the goals, intentions, or capabilities of the observed agent. The first proposed algorithm is a grid-based value iteration (Grid-VI), which discretizes the observer’s belief into regular grids. Based on the same discretization, the second proposed algorithm is a variant of Real-Time Dynamic Programming (RTDP) called Grid-RTDP. Unlike Grid-Vi, Grid-RTDP focuses its updates on promising states using heuristic estimates. We provide theoretical guarantees of the proposed algorithms and demonstrate that Grid-RTDP has a good anytime performance comparable to the existing approach without performance guarantees.

Cite

Text

Miura et al. "Approximation Algorithms for Observer Aware MDPs." Uncertainty in Artificial Intelligence, 2024.

Markdown

[Miura et al. "Approximation Algorithms for Observer Aware MDPs." Uncertainty in Artificial Intelligence, 2024.](https://mlanthology.org/uai/2024/miura2024uai-approximation/)

BibTeX

@inproceedings{miura2024uai-approximation,
  title     = {{Approximation Algorithms for Observer Aware MDPs}},
  author    = {Miura, Shuwa and Buffet, Olivier and Zilberstein, Shlomo},
  booktitle = {Uncertainty in Artificial Intelligence},
  year      = {2024},
  pages     = {2573-2586},
  volume    = {244},
  url       = {https://mlanthology.org/uai/2024/miura2024uai-approximation/}
}