Creating Advice-Taking Reinforcement Learners

Maclin, Richard; Shavlik, Jude W.

doi:10.1023/A:1018020625251

Creating Advice-Taking Reinforcement Learners

Richard Maclin, Jude W. Shavlik

MLJ 1996 pp. 251-281

doi:10.1023/A:1018020625251 /mlj/1996/maclin1996mlj-creating/

Abstract

Learning from reinforcements is a promising approach for creating intelligent agents. However, reinforcement learning usually requires a large number of training episodes. We present and evaluate a design that addresses this shortcoming by allowing a connectionist Q-learner to accept advice given, at any time and in a natural manner, by an external observer. In our approach, the advice-giver watches the learner and occasionally makes suggestions, expressed as instructions in a simple imperative programming language. Based on techniques from knowledge-based neural networks, we insert these programs directly into the agent‘s utility function. Subsequent reinforcement learning further integrates and refines the advice. We present empirical evidence that investigates several aspects of our approach and shows that, given good advice, a learner can achieve statistically significant gains in expected reward. A second experiment shows that advice improves the expected reward regardless of the stage of training at which it is given, while another study demonstrates that subsequent advice can result in further gains in reward. Finally, we present experimental results that indicate our method is more powerful than a naive technique for making use of advice.

PDF MLJ Semantic Scholar

Cite

Text

Maclin and Shavlik. "Creating Advice-Taking Reinforcement Learners." Machine Learning, 1996. doi:10.1023/A:1018020625251

Markdown

[Maclin and Shavlik. "Creating Advice-Taking Reinforcement Learners." Machine Learning, 1996.](https://mlanthology.org/mlj/1996/maclin1996mlj-creating/) doi:10.1023/A:1018020625251

BibTeX

@article{maclin1996mlj-creating,
  title     = {{Creating Advice-Taking Reinforcement Learners}},
  author    = {Maclin, Richard and Shavlik, Jude W.},
  journal   = {Machine Learning},
  year      = {1996},
  pages     = {251-281},
  doi       = {10.1023/A:1018020625251},
  volume    = {22},
  url       = {https://mlanthology.org/mlj/1996/maclin1996mlj-creating/}
}