Active Learning in Partially Observable Markov Decision Processes

Abstract

This paper examines the problem of finding an optimal policy for a Partially Observable Markov Decision Process (POMDP) when the model is not known or is only poorly specified. We propose two approaches to this problem. The first relies on a model of the uncertainty that is added directly into the POMDP planning problem. This has theoretical guarantees, but is impractical when many of the parameters are uncertain. The second, called MEDUSA, incrementally improves the POMDP model using selected queries, while still optimizing reward. Results show good performance of the algorithm even in large problems: the most useful parameters of the model are learned quickly and the agent still accumulates high reward throughout the process.

Cite

Text

Jaulmes et al. "Active Learning in Partially Observable Markov Decision Processes." European Conference on Machine Learning, 2005. doi:10.1007/11564096_59

Markdown

[Jaulmes et al. "Active Learning in Partially Observable Markov Decision Processes." European Conference on Machine Learning, 2005.](https://mlanthology.org/ecmlpkdd/2005/jaulmes2005ecml-active/) doi:10.1007/11564096_59

BibTeX

@inproceedings{jaulmes2005ecml-active,
  title     = {{Active Learning in Partially Observable Markov Decision Processes}},
  author    = {Jaulmes, Robin and Pineau, Joelle and Precup, Doina},
  booktitle = {European Conference on Machine Learning},
  year      = {2005},
  pages     = {601-608},
  doi       = {10.1007/11564096_59},
  url       = {https://mlanthology.org/ecmlpkdd/2005/jaulmes2005ecml-active/}
}