A Reinforcement Learning Algorithm in Partially Observable Environments Using Short-Term Memory
Abstract
We describe a Reinforcement Learning algorithm for partially observ(cid:173) able environments using short-term memory, which we call BLHT. Since BLHT learns a stochastic model based on Bayesian Learning, the over(cid:173) fitting problem is reasonably solved. Moreover, BLHT has an efficient implementation. This paper shows that the model learned by BLHT con(cid:173) verges to one which provides the most accurate predictions of percepts and rewards, given short-term memory.
Cite
Text
Suematsu and Hayashi. "A Reinforcement Learning Algorithm in Partially Observable Environments Using Short-Term Memory." Neural Information Processing Systems, 1998.Markdown
[Suematsu and Hayashi. "A Reinforcement Learning Algorithm in Partially Observable Environments Using Short-Term Memory." Neural Information Processing Systems, 1998.](https://mlanthology.org/neurips/1998/suematsu1998neurips-reinforcement/)BibTeX
@inproceedings{suematsu1998neurips-reinforcement,
title = {{A Reinforcement Learning Algorithm in Partially Observable Environments Using Short-Term Memory}},
author = {Suematsu, Nobuo and Hayashi, Akira},
booktitle = {Neural Information Processing Systems},
year = {1998},
pages = {1059-1065},
url = {https://mlanthology.org/neurips/1998/suematsu1998neurips-reinforcement/}
}