Convergent Combinations of Reinforcement Learning with Linear Function Approximation
Abstract
Convergence for iterative reinforcement learning algorithms like TD(O) depends on the sampling strategy for the transitions. How(cid:173) ever, in practical applications it is convenient to take transition data from arbitrary sources without losing convergence. In this paper we investigate the problem of repeated synchronous updates based on a fixed set of transitions. Our main theorem yields suffi(cid:173) cient conditions of convergence for combinations of reinforcement learning algorithms and linear function approximation. This allows to analyse if a certain reinforcement learning algorithm and a cer(cid:173) tain function approximator are compatible. For the combination of the residual gradient algorithm with grid-based linear interpolation we show that there exists a universal constant learning rate such that the iteration converges independently of the concrete transi(cid:173) tion data.
Cite
Text
Schoknecht and Merke. "Convergent Combinations of Reinforcement Learning with Linear Function Approximation." Neural Information Processing Systems, 2002.Markdown
[Schoknecht and Merke. "Convergent Combinations of Reinforcement Learning with Linear Function Approximation." Neural Information Processing Systems, 2002.](https://mlanthology.org/neurips/2002/schoknecht2002neurips-convergent/)BibTeX
@inproceedings{schoknecht2002neurips-convergent,
title = {{Convergent Combinations of Reinforcement Learning with Linear Function Approximation}},
author = {Schoknecht, Ralf and Merke, Artur},
booktitle = {Neural Information Processing Systems},
year = {2002},
pages = {1611-1618},
url = {https://mlanthology.org/neurips/2002/schoknecht2002neurips-convergent/}
}