Reinforcement Learning with Function Approximation Converges to a Region
Abstract
Many algorithms for approximate reinforcement learning are not known to converge. In fact, there are counterexamples showing that the adjustable weights in some algorithms may oscillate within a region rather than converging to a point. This paper shows that, for two popular algorithms, such oscillation is the worst that can happen: the weights cannot diverge, but instead must converge to a bounded region. The algorithms are SARSA(O) and V(O); the latter algorithm was used in the well-known TD-Gammon program.
Cite
Text
Gordon. "Reinforcement Learning with Function Approximation Converges to a Region." Neural Information Processing Systems, 2000.Markdown
[Gordon. "Reinforcement Learning with Function Approximation Converges to a Region." Neural Information Processing Systems, 2000.](https://mlanthology.org/neurips/2000/gordon2000neurips-reinforcement/)BibTeX
@inproceedings{gordon2000neurips-reinforcement,
title = {{Reinforcement Learning with Function Approximation Converges to a Region}},
author = {Gordon, Geoffrey J.},
booktitle = {Neural Information Processing Systems},
year = {2000},
pages = {1040-1046},
url = {https://mlanthology.org/neurips/2000/gordon2000neurips-reinforcement/}
}