Reinforcement Learning Using Approximate Belief States

Andres C. Rodriguez, Ronald Parr, Daphne Koller

NeurIPS 1999 pp. 1036-1042

/neurips/1999/rodriguez1999neurips-reinforcement/

Abstract

The problem of developing good policies for partially observable Markov decision problems (POMDPs) remains one of the most challenging ar(cid:173) eas of research in stochastic planning. One line of research in this area involves the use of reinforcement learning with belief states, probabil(cid:173) ity distributions over the underlying model states. This is a promis(cid:173) ing method for small problems, but its application is limited by the in(cid:173) tractability of computing or representing a full belief state for large prob(cid:173) lems. Recent work shows that, in many settings, we can maintain an approximate belief state, which is fairly close to the true belief state. In particular, great success has been shown with approximate belief states that marginalize out correlations between state variables. In this paper, we investigate two methods of full belief state reinforcement learning and one novel method for reinforcement learning using factored approximate belief states. We compare the performance of these algorithms on several well-known problem from the literature. Our results demonstrate the im(cid:173) portance of approximate belief state representations for large problems.

PDF NeurIPS Semantic Scholar

Cite

Text

Rodriguez et al. "Reinforcement Learning Using Approximate Belief States." Neural Information Processing Systems, 1999.

Markdown

[Rodriguez et al. "Reinforcement Learning Using Approximate Belief States." Neural Information Processing Systems, 1999.](https://mlanthology.org/neurips/1999/rodriguez1999neurips-reinforcement/)

BibTeX

@inproceedings{rodriguez1999neurips-reinforcement,
  title     = {{Reinforcement Learning Using Approximate Belief States}},
  author    = {Rodriguez, Andres C. and Parr, Ronald and Koller, Daphne},
  booktitle = {Neural Information Processing Systems},
  year      = {1999},
  pages     = {1036-1042},
  url       = {https://mlanthology.org/neurips/1999/rodriguez1999neurips-reinforcement/}
}