Learning Policies for Partially Observable Environments: Scaling up

Abstract

Partially observable Markov decision processes (POMDP's) model decision problems in which an agent tries to maximize its reward in the face of limited and/or noisy sensor feedback. While the study of POMDP's is motivated by a need to address realistic problems, existing techniques for finding optimal behavior do not appear to scale well and have been unable to find satisfactory policies for problems with more than a dozen states. After a brief review of POMDP's, this paper discusses several simple solution methods and shows that all are capable of finding near- optimal policies for a selection of extremely small POMDP'S taken from the learning literature. In contrast, we show that none are able to solve a slightly larger and noisier problem based on robot navigation. We find that a combination of two novel approaches performs well on these problems and suggest methods for scaling to even larger and more complicated domains.

Cite

Text

Littman et al. "Learning Policies for Partially Observable Environments: Scaling up." International Conference on Machine Learning, 1995. doi:10.1016/B978-1-55860-377-6.50052-9

Markdown

[Littman et al. "Learning Policies for Partially Observable Environments: Scaling up." International Conference on Machine Learning, 1995.](https://mlanthology.org/icml/1995/littman1995icml-learning/) doi:10.1016/B978-1-55860-377-6.50052-9

BibTeX

@inproceedings{littman1995icml-learning,
  title     = {{Learning Policies for Partially Observable Environments: Scaling up}},
  author    = {Littman, Michael L. and Cassandra, Anthony R. and Kaelbling, Leslie Pack},
  booktitle = {International Conference on Machine Learning},
  year      = {1995},
  pages     = {362-370},
  doi       = {10.1016/B978-1-55860-377-6.50052-9},
  url       = {https://mlanthology.org/icml/1995/littman1995icml-learning/}
}