Learning Factored Representations for Partially Observable Markov Decision Processes

Abstract

The problem of reinforcement learning in a non-Markov environment is explored using a dynamic Bayesian network, where conditional indepen(cid:173) dence assumptions between random variables are compactly represented by network parameters. The parameters are learned on-line, and approx(cid:173) imations are used to perform inference and to compute the optimal value function. The relative effects of inference and value function approxi(cid:173) mations on the quality of the final policy are investigated, by learning to solve a moderately difficult driving task. The two value function approx(cid:173) imations, linear and quadratic, were found to perform similarly, but the quadratic model was more sensitive to initialization. Both performed be(cid:173) low the level of human performance on the task. The dynamic Bayesian network performed comparably to a model using a localist hidden state representation, while requiring exponentially fewer parameters.

Cite

Text

Sallans. "Learning Factored Representations for Partially Observable Markov Decision Processes." Neural Information Processing Systems, 1999.

Markdown

[Sallans. "Learning Factored Representations for Partially Observable Markov Decision Processes." Neural Information Processing Systems, 1999.](https://mlanthology.org/neurips/1999/sallans1999neurips-learning/)

BibTeX

@inproceedings{sallans1999neurips-learning,
  title     = {{Learning Factored Representations for Partially Observable Markov Decision Processes}},
  author    = {Sallans, Brian},
  booktitle = {Neural Information Processing Systems},
  year      = {1999},
  pages     = {1050-1056},
  url       = {https://mlanthology.org/neurips/1999/sallans1999neurips-learning/}
}