Bounded Policy Iteration for Decentralized POMDPs

Bernstein, Daniel S.; Hansen, Eric A.; Zilberstein, Shlomo

Bounded Policy Iteration for Decentralized POMDPs

Daniel S. Bernstein, Eric A. Hansen, Shlomo Zilberstein

IJCAI 2005 pp. 1287-1292

/ijcai/2005/bernstein2005ijcai-bounded/

Abstract

We present a bounded policy iteration algorithm for infinite-horizon decentralized POMDPs. Policies are represented as joint stochastic finite-state controllers, which consist of a local controller for each agent. We also let a joint controller include a correlation device that allows the agents to correlate their behavior without exchanging information during execution, and show that this leads to improved performance. The algorithm uses a fixed amount of memory, and each iteration is guaranteed to produce a controller with value at least as high as the previous one for all possible initial state distributions. For the case of a single agent, the algorithm reduces to Poupart and Boutilier’s bounded policy iteration for POMDPs. 1

PDF Semantic Scholar

Cite

Text

Bernstein et al. "Bounded Policy Iteration for Decentralized POMDPs." International Joint Conference on Artificial Intelligence, 2005.

Markdown

[Bernstein et al. "Bounded Policy Iteration for Decentralized POMDPs." International Joint Conference on Artificial Intelligence, 2005.](https://mlanthology.org/ijcai/2005/bernstein2005ijcai-bounded/)

BibTeX

@inproceedings{bernstein2005ijcai-bounded,
  title     = {{Bounded Policy Iteration for Decentralized POMDPs}},
  author    = {Bernstein, Daniel S. and Hansen, Eric A. and Zilberstein, Shlomo},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2005},
  pages     = {1287-1292},
  url       = {https://mlanthology.org/ijcai/2005/bernstein2005ijcai-bounded/}
}