Optimal and Approximate Q-Value Functions for Decentralized POMDPs

Abstract

Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In single-agent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Q-value functions: an optimal Q-value function Q* is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q*. In this paper we study whether similar Q-value functions can be defined for decentralized POMDP models (Dec-POMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal Q-value function for Dec-POMDPs: one that gives a normative description as the Q-value function of an optimal pure joint policy and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible for all but the smallest problems. Therefore, we analyze various approximate Q-value functions that allow for efficient computation. We describe how they relate, and we prove that they all provide an upper bound to the optimal Q-value function Q*. Finally, unifying some previous approaches for solving Dec-POMDPs, we describe a family of algorithms for extracting policies from such Q-value functions, and perform an experimental evaluation on existing test problems, including a new firefighting benchmark problem.

Cite

Text

Oliehoek et al. "Optimal and Approximate Q-Value Functions for Decentralized POMDPs." Journal of Artificial Intelligence Research, 2008. doi:10.1613/JAIR.2447

Markdown

[Oliehoek et al. "Optimal and Approximate Q-Value Functions for Decentralized POMDPs." Journal of Artificial Intelligence Research, 2008.](https://mlanthology.org/jair/2008/oliehoek2008jair-optimal/) doi:10.1613/JAIR.2447

BibTeX

@article{oliehoek2008jair-optimal,
  title     = {{Optimal and Approximate Q-Value Functions for Decentralized POMDPs}},
  author    = {Oliehoek, Frans A. and Spaan, Matthijs T. J. and Vlassis, Nikos},
  journal   = {Journal of Artificial Intelligence Research},
  year      = {2008},
  pages     = {289-353},
  doi       = {10.1613/JAIR.2447},
  volume    = {32},
  url       = {https://mlanthology.org/jair/2008/oliehoek2008jair-optimal/}
}