Sufficient Plan-Time Statistics for Decentralized POMDPs

Abstract

Optimal decentralized decision making in a team of cooperative agents as formalized by decentralized POMDPs is a notoriously hard problem. A major obstacle is that the agents do not have access to a sufficient statistic during execution, which means that they need to base their actions on their histories of observations. A consequence is that even during off-line planning the choice of decision rules for different stages is tightly interwoven: decisions of earlier stages affect how to act optimally at later stages, and the optimal value function for a stage is known to have a dependence on the decisions made up to that point. This paper makes a contribution to the theory of decentralized POMDPs by showing how this dependence on the 'past joint policy' can be replaced by a sufficient statistic. These results are extended to the case of k-step delayed communication. The paper investigates the practical implications, as well as the effectiveness of a new pruning technique for MAA* methods, in a number of benchmark problems and discusses future avenues of research opened by these contributions.

Cite

Text

Oliehoek. "Sufficient Plan-Time Statistics for Decentralized POMDPs." International Joint Conference on Artificial Intelligence, 2013.

Markdown

[Oliehoek. "Sufficient Plan-Time Statistics for Decentralized POMDPs." International Joint Conference on Artificial Intelligence, 2013.](https://mlanthology.org/ijcai/2013/oliehoek2013ijcai-sufficient/)

BibTeX

@inproceedings{oliehoek2013ijcai-sufficient,
  title     = {{Sufficient Plan-Time Statistics for Decentralized POMDPs}},
  author    = {Oliehoek, Frans Adriaan},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2013},
  pages     = {302-308},
  url       = {https://mlanthology.org/ijcai/2013/oliehoek2013ijcai-sufficient/}
}