Sufficient Plan-Time Statistics for Decentralized POMDPs
Abstract
Optimal decentralized decision making in a team of cooperative agents as formalized by decentralized POMDPs is a notoriously hard problem. A major obstacle is that the agents do not have access to a sufficient statistic during execution, which means that they need to base their actions on their histories of observations. A consequence is that even during off-line planning the choice of decision rules for different stages is tightly interwoven: decisions of earlier stages affect how to act optimally at later stages, and the optimal value function for a stage is known to have a dependence on the decisions made up to that point. This paper makes a contribution to the theory of decentralized POMDPs by showing how this dependence on the 'past joint policy' can be replaced by a sufficient statistic. These results are extended to the case of k-step delayed communication. The paper investigates the practical implications, as well as the effectiveness of a new pruning technique for MAA* methods, in a number of benchmark problems and discusses future avenues of research opened by these contributions.
Cite
Text
Oliehoek. "Sufficient Plan-Time Statistics for Decentralized POMDPs." International Joint Conference on Artificial Intelligence, 2013.Markdown
[Oliehoek. "Sufficient Plan-Time Statistics for Decentralized POMDPs." International Joint Conference on Artificial Intelligence, 2013.](https://mlanthology.org/ijcai/2013/oliehoek2013ijcai-sufficient/)BibTeX
@inproceedings{oliehoek2013ijcai-sufficient,
title = {{Sufficient Plan-Time Statistics for Decentralized POMDPs}},
author = {Oliehoek, Frans Adriaan},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2013},
pages = {302-308},
url = {https://mlanthology.org/ijcai/2013/oliehoek2013ijcai-sufficient/}
}