Informed Initial Policies for Learning in Dec-POMDPs

Abstract

Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a formal model for planning in cooperative multiagent systems where agents operate with noisy sensors and actuators, and local information. Prevalent Dec-POMDP solution techniques have mostly been centralized and have assumed knowledge of the model. In real world scenarios, however, solving centrally may not be an option and model parameters maybe unknown. To address this, we propose a distributed, model-free algorithm for learning Dec-POMDP policies, in which agents take turns learning, with each agent not currently learning following a static policy. For agents that have not yet learned a policy, this static policy must be initialized. We propose a principled method for learning such initial policies through interaction with the environment. We show that by using such informed initial policies, our alternate learning algorithm can find near-optimal policies for two benchmark problems.

Cite

Text

Kraemer and Banerjee. "Informed Initial Policies for Learning in Dec-POMDPs." AAAI Conference on Artificial Intelligence, 2012. doi:10.1609/AAAI.V26I1.8426

Markdown

[Kraemer and Banerjee. "Informed Initial Policies for Learning in Dec-POMDPs." AAAI Conference on Artificial Intelligence, 2012.](https://mlanthology.org/aaai/2012/kraemer2012aaai-informed/) doi:10.1609/AAAI.V26I1.8426

BibTeX

@inproceedings{kraemer2012aaai-informed,
  title     = {{Informed Initial Policies for Learning in Dec-POMDPs}},
  author    = {Kraemer, Landon and Banerjee, Bikramjit},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2012},
  pages     = {2433-2434},
  doi       = {10.1609/AAAI.V26I1.8426},
  url       = {https://mlanthology.org/aaai/2012/kraemer2012aaai-informed/}
}