Convergence of Reinforcement Learning with General Function Approximators

Abstract

A key open problem in reinforcement learning is to assure convergence when using a compact hypothesis class to approximate the value function. Although the standard temporal-difference learning algorithm has been shown to converge when the hypothesis class is a linear combination of fixed basis functions, it may diverge with a general (nonlinear) hypothesis class. This paper describes the Bridge algorithm, a new method for reinforcement learning, and shows that it converges to an approximate global optimum for any agnostically learnable hypothesis class. Convergence is demonstrated on a simple example for which temporal-difference learning fails. Weak conditions are identified under which the Bridge algorithm converges for any hypothesis class. Finally, connections are made between the complexity of reinforcement learning and the PAC-learnability of the hypothesis class. 1

Cite

Text

Papavassiliou and Russell. "Convergence of Reinforcement Learning with General Function Approximators." International Joint Conference on Artificial Intelligence, 1999.

Markdown

[Papavassiliou and Russell. "Convergence of Reinforcement Learning with General Function Approximators." International Joint Conference on Artificial Intelligence, 1999.](https://mlanthology.org/ijcai/1999/papavassiliou1999ijcai-convergence/)

BibTeX

@inproceedings{papavassiliou1999ijcai-convergence,
  title     = {{Convergence of Reinforcement Learning with General Function Approximators}},
  author    = {Papavassiliou, Vassilis A. and Russell, Stuart},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {1999},
  pages     = {748-757},
  url       = {https://mlanthology.org/ijcai/1999/papavassiliou1999ijcai-convergence/}
}