Analyzing and Escaping Local Optima in Planning as Inference for Partially Observable Domains

Abstract

Planning as inference recently emerged as a versatile approach to decision-theoretic planning and reinforcement learning for single and multi-agent systems in fully and partially observable domains with discrete and continuous variables. Since planning as inference essentially tackles a non-convex optimization problem when the states are partially observable, there is a need to develop techniques that can robustly escape local optima. We investigate the local optima of finite state controllers in single agent partially observable Markov decision processes (POMDPs) that are optimized by expectation maximization (EM). We show that EM converges to controllers that are optimal with respect to a one-step lookahead. To escape local optima, we propose two algorithms: the first one adds nodes to the controller to ensure optimality with respect to a multi-step lookahead, while the second one splits nodes in a greedy fashion to improve reward likelihood. The approaches are demonstrated empirically on benchmark problems.

Cite

Text

Poupart et al. "Analyzing and Escaping Local Optima in Planning as Inference for Partially Observable Domains." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2011. doi:10.1007/978-3-642-23783-6_39

Markdown

[Poupart et al. "Analyzing and Escaping Local Optima in Planning as Inference for Partially Observable Domains." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2011.](https://mlanthology.org/ecmlpkdd/2011/poupart2011ecmlpkdd-analyzing/) doi:10.1007/978-3-642-23783-6_39

BibTeX

@inproceedings{poupart2011ecmlpkdd-analyzing,
  title     = {{Analyzing and Escaping Local Optima in Planning as Inference for Partially Observable Domains}},
  author    = {Poupart, Pascal and Lang, Tobias and Toussaint, Marc},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2011},
  pages     = {613-628},
  doi       = {10.1007/978-3-642-23783-6_39},
  url       = {https://mlanthology.org/ecmlpkdd/2011/poupart2011ecmlpkdd-analyzing/}
}