Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes
Abstract
Recent research on hidden-state reinforcement learning (RL) problems has concentrated on overcoming partial observability by using memory to estimate state. However, such methods are computationally extremely expensive and thus have very limited applicability. This emphasis on state estimation has come about because it has been widely observed that the presence of hidden state or partial observability renders popular RL methods such as Q-learning and Sarsa useless. However, this observation is misleading in two ways: first, the theoretical results supporting it only apply to RL algorithms that do not use eligibility traces, and second these results are worst-case results, which leaves open the possibility that there may be large classes of hidden-state problems in which RL algorithms work well without any state estimation. In this paper we show empirically that Sarsa(#), a well known family of RL algorithms that use eligibility traces, can work very well on hidden state problems that ...
Cite
Text
Loch and Singh. "Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes." International Conference on Machine Learning, 1998.Markdown
[Loch and Singh. "Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes." International Conference on Machine Learning, 1998.](https://mlanthology.org/icml/1998/loch1998icml-using/)BibTeX
@inproceedings{loch1998icml-using,
title = {{Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes}},
author = {Loch, John and Singh, Satinder},
booktitle = {International Conference on Machine Learning},
year = {1998},
pages = {323-331},
url = {https://mlanthology.org/icml/1998/loch1998icml-using/}
}