Self-Improvement Based on Reinforcement Learning, Planning and Teaching
Abstract
AHC-learning and Q-learning are slow learning methods. This paper investigates three extensions to the two basic learning algorithms. The three extensions are 1) experience replay, 2) learning action models for planning, and 3) teaching. The basic algorithms and their extensions were evaluated using a dynamic environment as a testbed. The environment is nontrivial and nondeter-ministic. The results show that the extensions can effectively improve the learning rate and in many cases even the asymptotic performance.
Cite
Text
Lin. "Self-Improvement Based on Reinforcement Learning, Planning and Teaching." International Conference on Machine Learning, 1991. doi:10.1016/B978-1-55860-200-7.50067-2Markdown
[Lin. "Self-Improvement Based on Reinforcement Learning, Planning and Teaching." International Conference on Machine Learning, 1991.](https://mlanthology.org/icml/1991/lin1991icml-self/) doi:10.1016/B978-1-55860-200-7.50067-2BibTeX
@inproceedings{lin1991icml-self,
title = {{Self-Improvement Based on Reinforcement Learning, Planning and Teaching}},
author = {Lin, Long Ji},
booktitle = {International Conference on Machine Learning},
year = {1991},
pages = {323-327},
doi = {10.1016/B978-1-55860-200-7.50067-2},
url = {https://mlanthology.org/icml/1991/lin1991icml-self/}
}