Learning Factored Representations for Partially Observable Markov Decision Processes
Abstract
The problem of reinforcement learning in a non-Markov environment is explored using a dynamic Bayesian network, where conditional indepen(cid:173) dence assumptions between random variables are compactly represented by network parameters. The parameters are learned on-line, and approx(cid:173) imations are used to perform inference and to compute the optimal value function. The relative effects of inference and value function approxi(cid:173) mations on the quality of the final policy are investigated, by learning to solve a moderately difficult driving task. The two value function approx(cid:173) imations, linear and quadratic, were found to perform similarly, but the quadratic model was more sensitive to initialization. Both performed be(cid:173) low the level of human performance on the task. The dynamic Bayesian network performed comparably to a model using a localist hidden state representation, while requiring exponentially fewer parameters.
Cite
Text
Sallans. "Learning Factored Representations for Partially Observable Markov Decision Processes." Neural Information Processing Systems, 1999.Markdown
[Sallans. "Learning Factored Representations for Partially Observable Markov Decision Processes." Neural Information Processing Systems, 1999.](https://mlanthology.org/neurips/1999/sallans1999neurips-learning/)BibTeX
@inproceedings{sallans1999neurips-learning,
title = {{Learning Factored Representations for Partially Observable Markov Decision Processes}},
author = {Sallans, Brian},
booktitle = {Neural Information Processing Systems},
year = {1999},
pages = {1050-1056},
url = {https://mlanthology.org/neurips/1999/sallans1999neurips-learning/}
}