An Improved Policy Iteration Algorithm for Partially Observable MDPs

Abstract

A new policy iteration algorithm for partially observable Markov decision processes is presented that is simpler and more efficient than an earlier policy iteration algorithm of Sondik (1971,1978). The key simplification is representation of a policy as a finite-state controller. This representation makes policy evaluation straightforward. The pa(cid:173) per's contribution is to show that the dynamic-programming update used in the policy improvement step can be interpreted as the trans(cid:173) formation of a finite-state controller into an improved finite-state con(cid:173) troller. The new algorithm consistently outperforms value iteration as an approach to solving infinite-horizon problems.

Cite

Text

Hansen. "An Improved Policy Iteration Algorithm for Partially Observable MDPs." Neural Information Processing Systems, 1997.

Markdown

[Hansen. "An Improved Policy Iteration Algorithm for Partially Observable MDPs." Neural Information Processing Systems, 1997.](https://mlanthology.org/neurips/1997/hansen1997neurips-improved/)

BibTeX

@inproceedings{hansen1997neurips-improved,
  title     = {{An Improved Policy Iteration Algorithm for Partially Observable MDPs}},
  author    = {Hansen, Eric A.},
  booktitle = {Neural Information Processing Systems},
  year      = {1997},
  pages     = {1015-1021},
  url       = {https://mlanthology.org/neurips/1997/hansen1997neurips-improved/}
}