Generalization in Reinforcement Learning: Safely Approximating the Value Function

Abstract

A straightforward approach to the curse of dimensionality in re(cid:173) inforcement learning and dynamic programming is to replace the lookup table with a generalizing function approximator such as a neu(cid:173) ral net. Although this has been successful in the domain of backgam(cid:173) mon, there is no guarantee of convergence. In this paper, we show that the combination of dynamic programming and function approx(cid:173) imation is not robust, and in even very benign cases, may produce an entirely wrong policy. We then introduce Grow-Support, a new algorithm which is safe from divergence yet can still reap the benefits of successful generalization .

Cite

Text

Boyan and Moore. "Generalization in Reinforcement Learning: Safely Approximating the Value Function." Neural Information Processing Systems, 1994.

Markdown

[Boyan and Moore. "Generalization in Reinforcement Learning: Safely Approximating the Value Function." Neural Information Processing Systems, 1994.](https://mlanthology.org/neurips/1994/boyan1994neurips-generalization/)

BibTeX

@inproceedings{boyan1994neurips-generalization,
  title     = {{Generalization in Reinforcement Learning: Safely Approximating the Value Function}},
  author    = {Boyan, Justin A. and Moore, Andrew W.},
  booktitle = {Neural Information Processing Systems},
  year      = {1994},
  pages     = {369-376},
  url       = {https://mlanthology.org/neurips/1994/boyan1994neurips-generalization/}
}