Convergence of Reinforcement Learning with General Function Approximators
Abstract
A key open problem in reinforcement learning is to assure convergence when using a compact hypothesis class to approximate the value function. Although the standard temporal-difference learning algorithm has been shown to converge when the hypothesis class is a linear combination of fixed basis functions, it may diverge with a general (nonlinear) hypothesis class. This paper describes the Bridge algorithm, a new method for reinforcement learning, and shows that it converges to an approximate global optimum for any agnostically learnable hypothesis class. Convergence is demonstrated on a simple example for which temporal-difference learning fails. Weak conditions are identified under which the Bridge algorithm converges for any hypothesis class. Finally, connections are made between the complexity of reinforcement learning and the PAC-learnability of the hypothesis class. 1
Cite
Text
Papavassiliou and Russell. "Convergence of Reinforcement Learning with General Function Approximators." International Joint Conference on Artificial Intelligence, 1999.Markdown
[Papavassiliou and Russell. "Convergence of Reinforcement Learning with General Function Approximators." International Joint Conference on Artificial Intelligence, 1999.](https://mlanthology.org/ijcai/1999/papavassiliou1999ijcai-convergence/)BibTeX
@inproceedings{papavassiliou1999ijcai-convergence,
title = {{Convergence of Reinforcement Learning with General Function Approximators}},
author = {Papavassiliou, Vassilis A. and Russell, Stuart},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {1999},
pages = {748-757},
url = {https://mlanthology.org/ijcai/1999/papavassiliou1999ijcai-convergence/}
}