Interpolation-Based Q-Learning

Szepesvári, Csaba; Smart, William D.

doi:10.1145/1015330.1015445

Interpolation-Based Q-Learning

Csaba Szepesvári, William D. Smart

ICML 2004

doi:10.1145/1015330.1015445 /icml/2004/szepesvari2004icml-interpolation/

Abstract

We consider a variant of Q-learning in continuous state spaces under the totalexpected discounted cost criterion combined with local function approximationmethods. Provided that the function approximator satisfies certaininterpolation properties, the resulting algorithm is shown to converge withprobability one. The limit function is shown to satisfy a fixed point equationof the Bellman type, where the fixed point operator depends on the stationarydistribution of the exploration policy and the function approximation method. The basic algorithm is extended in several ways. In particular, a variant ofthe algorithm is obtained that is shown to converge in probability to theoptimal Q function. Preliminary computer simulations are presented thatconfirm the validity of the approach.

PDF ICML Semantic Scholar

Cite

Text

Szepesvári and Smart. "Interpolation-Based Q-Learning." International Conference on Machine Learning, 2004. doi:10.1145/1015330.1015445

Markdown

[Szepesvári and Smart. "Interpolation-Based Q-Learning." International Conference on Machine Learning, 2004.](https://mlanthology.org/icml/2004/szepesvari2004icml-interpolation/) doi:10.1145/1015330.1015445

BibTeX

@inproceedings{szepesvari2004icml-interpolation,
  title     = {{Interpolation-Based Q-Learning}},
  author    = {Szepesvári, Csaba and Smart, William D.},
  booktitle = {International Conference on Machine Learning},
  year      = {2004},
  doi       = {10.1145/1015330.1015445},
  url       = {https://mlanthology.org/icml/2004/szepesvari2004icml-interpolation/}
}