Approximation Algorithms for Observer Aware MDPs

Shuwa Miura, Olivier Buffet, Shlomo Zilberstein

UAI 2024 pp. 2573-2586

/uai/2024/miura2024uai-approximation/

Abstract

We present approximation algorithms for Observer-Aware Markov Decision Processes (OAMDPs). OAMDPs model sequential decision-making problems in which rewards depend on the beliefs of an observer about the goals, intentions, or capabilities of the observed agent. The first proposed algorithm is a grid-based value iteration (Grid-VI), which discretizes the observer’s belief into regular grids. Based on the same discretization, the second proposed algorithm is a variant of Real-Time Dynamic Programming (RTDP) called Grid-RTDP. Unlike Grid-Vi, Grid-RTDP focuses its updates on promising states using heuristic estimates. We provide theoretical guarantees of the proposed algorithms and demonstrate that Grid-RTDP has a good anytime performance comparable to the existing approach without performance guarantees.

PDF UAI OpenReview Semantic Scholar

Cite

Text

Miura et al. "Approximation Algorithms for Observer Aware MDPs." Uncertainty in Artificial Intelligence, 2024.

Markdown

[Miura et al. "Approximation Algorithms for Observer Aware MDPs." Uncertainty in Artificial Intelligence, 2024.](https://mlanthology.org/uai/2024/miura2024uai-approximation/)

BibTeX

@inproceedings{miura2024uai-approximation,
  title     = {{Approximation Algorithms for Observer Aware MDPs}},
  author    = {Miura, Shuwa and Buffet, Olivier and Zilberstein, Shlomo},
  booktitle = {Uncertainty in Artificial Intelligence},
  year      = {2024},
  pages     = {2573-2586},
  volume    = {244},
  url       = {https://mlanthology.org/uai/2024/miura2024uai-approximation/}
}