Convergent Combinations of Reinforcement Learning with Linear Function Approximation

Abstract

Convergence for iterative reinforcement learning algorithms like TD(O) depends on the sampling strategy for the transitions. How(cid:173) ever, in practical applications it is convenient to take transition data from arbitrary sources without losing convergence. In this paper we investigate the problem of repeated synchronous updates based on a fixed set of transitions. Our main theorem yields suffi(cid:173) cient conditions of convergence for combinations of reinforcement learning algorithms and linear function approximation. This allows to analyse if a certain reinforcement learning algorithm and a cer(cid:173) tain function approximator are compatible. For the combination of the residual gradient algorithm with grid-based linear interpolation we show that there exists a universal constant learning rate such that the iteration converges independently of the concrete transi(cid:173) tion data.

Cite

Text

Schoknecht and Merke. "Convergent Combinations of Reinforcement Learning with Linear Function Approximation." Neural Information Processing Systems, 2002.

Markdown

[Schoknecht and Merke. "Convergent Combinations of Reinforcement Learning with Linear Function Approximation." Neural Information Processing Systems, 2002.](https://mlanthology.org/neurips/2002/schoknecht2002neurips-convergent/)

BibTeX

@inproceedings{schoknecht2002neurips-convergent,
  title     = {{Convergent Combinations of Reinforcement Learning with Linear Function Approximation}},
  author    = {Schoknecht, Ralf and Merke, Artur},
  booktitle = {Neural Information Processing Systems},
  year      = {2002},
  pages     = {1611-1618},
  url       = {https://mlanthology.org/neurips/2002/schoknecht2002neurips-convergent/}
}