Bounded Policy Iteration for Decentralized POMDPs
Abstract
We present a bounded policy iteration algorithm for infinite-horizon decentralized POMDPs. Policies are represented as joint stochastic finite-state controllers, which consist of a local controller for each agent. We also let a joint controller include a correlation device that allows the agents to correlate their behavior without exchanging information during execution, and show that this leads to improved performance. The algorithm uses a fixed amount of memory, and each iteration is guaranteed to produce a controller with value at least as high as the previous one for all possible initial state distributions. For the case of a single agent, the algorithm reduces to Poupart and Boutilier’s bounded policy iteration for POMDPs. 1
Cite
Text
Bernstein et al. "Bounded Policy Iteration for Decentralized POMDPs." International Joint Conference on Artificial Intelligence, 2005.Markdown
[Bernstein et al. "Bounded Policy Iteration for Decentralized POMDPs." International Joint Conference on Artificial Intelligence, 2005.](https://mlanthology.org/ijcai/2005/bernstein2005ijcai-bounded/)BibTeX
@inproceedings{bernstein2005ijcai-bounded,
title = {{Bounded Policy Iteration for Decentralized POMDPs}},
author = {Bernstein, Daniel S. and Hansen, Eric A. and Zilberstein, Shlomo},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2005},
pages = {1287-1292},
url = {https://mlanthology.org/ijcai/2005/bernstein2005ijcai-bounded/}
}