Reinforcement Learning Based on On-Line EM Algorithm

Abstract

In this article, we propose a new reinforcement learning (RL) method based on an actor-critic architecture. The actor and the critic are approximated by Normalized Gaussian Networks (NGnet), which are networks of local linear regression units. The NGnet is trained by the on-line EM algorithm proposed in our pre(cid:173) vious paper. We apply our RL method to the task of swinging-up and stabilizing a single pendulum and the task of balancing a dou(cid:173) ble pendulum near the upright position. The experimental results show that our RL method can be applied to optimal control prob(cid:173) lems having continuous state/action spaces and that the method achieves good control with a small number of trial-and-errors.

Cite

Text

Sato and Ishii. "Reinforcement Learning Based on On-Line EM Algorithm." Neural Information Processing Systems, 1998.

Markdown

[Sato and Ishii. "Reinforcement Learning Based on On-Line EM Algorithm." Neural Information Processing Systems, 1998.](https://mlanthology.org/neurips/1998/sato1998neurips-reinforcement/)

BibTeX

@inproceedings{sato1998neurips-reinforcement,
  title     = {{Reinforcement Learning Based on On-Line EM Algorithm}},
  author    = {Sato, Masa-aki and Ishii, Shin},
  booktitle = {Neural Information Processing Systems},
  year      = {1998},
  pages     = {1052-1058},
  url       = {https://mlanthology.org/neurips/1998/sato1998neurips-reinforcement/}
}