Planning and Learning for Decentralized MDPs with Event Driven Rewards

Abstract

Decentralized (PO)MDPs provide a rigorous framework for sequential multiagent decision making under uncertainty. However, their high computational complexity limits the practical impact. To address scalability and real-world impact, we focus on settings where a large number of agents primarily interact through complex joint-rewards that depend on their entire histories of states and actions. Such history-based rewards encapsulate the notion of events or tasks such that the team reward is given only when the joint-task is completed. Algorithmically, we contribute---1) A nonlinear programming (NLP) formulation for such event-based planning model; 2) A probabilistic inference based approach that scales much better than NLP solvers for a large number of agents; 3) A policy gradient based multiagent reinforcement learning approach that scales well even for exponential state-spaces. Our inference and RL-based advances enable us to solve a large real-world multiagent coverage problem modeling schedule coordination of agents in a real urban subway network where other approaches fail to scale.

Cite

Text

Gupta et al. "Planning and Learning for Decentralized MDPs with Event Driven Rewards." AAAI Conference on Artificial Intelligence, 2018. doi:10.1609/AAAI.V32I1.12096

Markdown

[Gupta et al. "Planning and Learning for Decentralized MDPs with Event Driven Rewards." AAAI Conference on Artificial Intelligence, 2018.](https://mlanthology.org/aaai/2018/gupta2018aaai-planning/) doi:10.1609/AAAI.V32I1.12096

BibTeX

@inproceedings{gupta2018aaai-planning,
  title     = {{Planning and Learning for Decentralized MDPs with Event Driven Rewards}},
  author    = {Gupta, Tarun and Kumar, Akshat and Paruchuri, Praveen},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2018},
  pages     = {6186-6194},
  doi       = {10.1609/AAAI.V32I1.12096},
  url       = {https://mlanthology.org/aaai/2018/gupta2018aaai-planning/}
}