Interpolation-Based Q-Learning
Abstract
We consider a variant of Q-learning in continuous state spaces under the totalexpected discounted cost criterion combined with local function approximationmethods. Provided that the function approximator satisfies certaininterpolation properties, the resulting algorithm is shown to converge withprobability one. The limit function is shown to satisfy a fixed point equationof the Bellman type, where the fixed point operator depends on the stationarydistribution of the exploration policy and the function approximation method. The basic algorithm is extended in several ways. In particular, a variant ofthe algorithm is obtained that is shown to converge in probability to theoptimal Q function. Preliminary computer simulations are presented thatconfirm the validity of the approach.
Cite
Text
Szepesvári and Smart. "Interpolation-Based Q-Learning." International Conference on Machine Learning, 2004. doi:10.1145/1015330.1015445Markdown
[Szepesvári and Smart. "Interpolation-Based Q-Learning." International Conference on Machine Learning, 2004.](https://mlanthology.org/icml/2004/szepesvari2004icml-interpolation/) doi:10.1145/1015330.1015445BibTeX
@inproceedings{szepesvari2004icml-interpolation,
title = {{Interpolation-Based Q-Learning}},
author = {Szepesvári, Csaba and Smart, William D.},
booktitle = {International Conference on Machine Learning},
year = {2004},
doi = {10.1145/1015330.1015445},
url = {https://mlanthology.org/icml/2004/szepesvari2004icml-interpolation/}
}