Stick-Breaking Policy Learning in Dec-POMDPs

Abstract

Expectation maximization (EM) has recently been shown to be an efficient algorithm for learning finite-state controllers (FSCs) in large decentralized POMDPs (Dec-POMDPs). However, current methods use fixed-size FSCs and often converge to maxima that are far from the optimal value. This paper considers a variable-size FSC to represent the local policy of each agent. These variable-size FSCs are constructed using a stick-breaking prior, leading to a new framework called decentralized stick-breaking policy representation (Dec-SBPR). This approach learns the controller parameters with a variational Bayesian algorithm without having to assume that the Dec-POMDP model is available. The performance of Dec-SBPR is demonstrated on several benchmark problems, showing that the algorithm scales to large problems while outperforming other state-of-the-art methods.

Cite

Text

Liu et al. "Stick-Breaking Policy Learning in Dec-POMDPs." International Joint Conference on Artificial Intelligence, 2015.

Markdown

[Liu et al. "Stick-Breaking Policy Learning in Dec-POMDPs." International Joint Conference on Artificial Intelligence, 2015.](https://mlanthology.org/ijcai/2015/liu2015ijcai-stick/)

BibTeX

@inproceedings{liu2015ijcai-stick,
  title     = {{Stick-Breaking Policy Learning in Dec-POMDPs}},
  author    = {Liu, Miao and Amato, Christopher and Liao, Xuejun and Carin, Lawrence and How, Jonathan P.},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2015},
  pages     = {2011-2018},
  url       = {https://mlanthology.org/ijcai/2015/liu2015ijcai-stick/}
}