A Reinforcement Learning Algorithm in Partially Observable Environments Using Short-Term Memory

Abstract

We describe a Reinforcement Learning algorithm for partially observ(cid:173) able environments using short-term memory, which we call BLHT. Since BLHT learns a stochastic model based on Bayesian Learning, the over(cid:173) fitting problem is reasonably solved. Moreover, BLHT has an efficient implementation. This paper shows that the model learned by BLHT con(cid:173) verges to one which provides the most accurate predictions of percepts and rewards, given short-term memory.

Cite

Text

Suematsu and Hayashi. "A Reinforcement Learning Algorithm in Partially Observable Environments Using Short-Term Memory." Neural Information Processing Systems, 1998.

Markdown

[Suematsu and Hayashi. "A Reinforcement Learning Algorithm in Partially Observable Environments Using Short-Term Memory." Neural Information Processing Systems, 1998.](https://mlanthology.org/neurips/1998/suematsu1998neurips-reinforcement/)

BibTeX

@inproceedings{suematsu1998neurips-reinforcement,
  title     = {{A Reinforcement Learning Algorithm in Partially Observable Environments Using Short-Term Memory}},
  author    = {Suematsu, Nobuo and Hayashi, Akira},
  booktitle = {Neural Information Processing Systems},
  year      = {1998},
  pages     = {1059-1065},
  url       = {https://mlanthology.org/neurips/1998/suematsu1998neurips-reinforcement/}
}