Learning Without State-Estimation in Partially Observable Markovian Decision Processes
Abstract
Reinforcement learning (RL) algorithms provide a sound theoretical basis for building learning control architectures for embedded agents. Unfortunately all of the theory and much of the practice (see Barto et al., 1983, for an exception) of RL is limited to Markovian decision processes (MDPs). Many real-world decision tasks, however, are inherently non-Markovian, i.e., the state of the environment is only incompletely known to the learning agent. In this paper we consider only partially observable MDPs (POMDPs), a useful class of non-Markovian decision processes. Most previous approaches to such problems have combined computationally expensive state-estimation techniques with learning control. This paper investigates learning in POMDPs without resorting to any form of state estimation. We present results about what TD(0) and Q-learning will do when applied to POMDPs. It is shown that the conventional discounted RL framework is inadequate to deal with POMDPs. Finally we develop a new framework for learning without state-estimation in POMDPs by including stochastic policies in the search space, and by defining the value or utility of a distribution over states.
Cite
Text
Singh et al. "Learning Without State-Estimation in Partially Observable Markovian Decision Processes." International Conference on Machine Learning, 1994. doi:10.1016/B978-1-55860-335-6.50042-8Markdown
[Singh et al. "Learning Without State-Estimation in Partially Observable Markovian Decision Processes." International Conference on Machine Learning, 1994.](https://mlanthology.org/icml/1994/singh1994icml-learning/) doi:10.1016/B978-1-55860-335-6.50042-8BibTeX
@inproceedings{singh1994icml-learning,
title = {{Learning Without State-Estimation in Partially Observable Markovian Decision Processes}},
author = {Singh, Satinder P. and Jaakkola, Tommi S. and Jordan, Michael I.},
booktitle = {International Conference on Machine Learning},
year = {1994},
pages = {284-292},
doi = {10.1016/B978-1-55860-335-6.50042-8},
url = {https://mlanthology.org/icml/1994/singh1994icml-learning/}
}