Reinforcement Learning Based on On-Line EM Algorithm
Abstract
In this article, we propose a new reinforcement learning (RL) method based on an actor-critic architecture. The actor and the critic are approximated by Normalized Gaussian Networks (NGnet), which are networks of local linear regression units. The NGnet is trained by the on-line EM algorithm proposed in our pre(cid:173) vious paper. We apply our RL method to the task of swinging-up and stabilizing a single pendulum and the task of balancing a dou(cid:173) ble pendulum near the upright position. The experimental results show that our RL method can be applied to optimal control prob(cid:173) lems having continuous state/action spaces and that the method achieves good control with a small number of trial-and-errors.
Cite
Text
Sato and Ishii. "Reinforcement Learning Based on On-Line EM Algorithm." Neural Information Processing Systems, 1998.Markdown
[Sato and Ishii. "Reinforcement Learning Based on On-Line EM Algorithm." Neural Information Processing Systems, 1998.](https://mlanthology.org/neurips/1998/sato1998neurips-reinforcement/)BibTeX
@inproceedings{sato1998neurips-reinforcement,
title = {{Reinforcement Learning Based on On-Line EM Algorithm}},
author = {Sato, Masa-aki and Ishii, Shin},
booktitle = {Neural Information Processing Systems},
year = {1998},
pages = {1052-1058},
url = {https://mlanthology.org/neurips/1998/sato1998neurips-reinforcement/}
}