Generalization and Scaling in Reinforcement Learning

Abstract

In associative reinforcement learning, an environment generates input vectors, a learning system generates possible output vectors, and a re(cid:173) inforcement function computes feedback signals from the input-output pairs. The task is to discover and remember input-output pairs that generate rewards. Especially difficult cases occur when rewards are rare, since the expected time for any algorithm can grow exponentially with the size of the problem. Nonetheless, if a reinforcement function possesses regularities, and a learning algorithm exploits them, learning time can be reduced below that of non-generalizing algorithms. This paper describes a neural network algorithm called complementary re(cid:173) inforcement back-propagation (CRBP), and reports simulation results on problems designed to offer differing opportunities for generalization.

Cite

Text

Ackley and Littman. "Generalization and Scaling in Reinforcement Learning." Neural Information Processing Systems, 1989.

Markdown

[Ackley and Littman. "Generalization and Scaling in Reinforcement Learning." Neural Information Processing Systems, 1989.](https://mlanthology.org/neurips/1989/ackley1989neurips-generalization/)

BibTeX

@inproceedings{ackley1989neurips-generalization,
  title     = {{Generalization and Scaling in Reinforcement Learning}},
  author    = {Ackley, David H. and Littman, Michael L.},
  booktitle = {Neural Information Processing Systems},
  year      = {1989},
  pages     = {550-557},
  url       = {https://mlanthology.org/neurips/1989/ackley1989neurips-generalization/}
}