A Semiparametric Statistical Approach to Model-Free Policy Evaluation

Abstract

Reinforcement learning (RL) methods based on least-squares temporal difference (LSTD) have been developed recently and have shown good practical performance. However, the quality of their estimation has not been well elucidated. In this article, we discuss LSTD based policy evaluation from the new viewpoint of semiparametric statistical inference. In fact, the estimator can be obtained from a particular estimating function which guarantees its convergence to the true value asymptotically, without specifying a model of the environment. Based on these observations, we 1) analyze the asymptotic variance of an LSTD-based estimator, 2) derive the optimal estimating function with the minimum asymptotic estimation variance, and 3) derive a suboptimal estimator to reduce the computational burden in obtaining the optimal estimating function.

Cite

Text

Ueno et al. "A Semiparametric Statistical Approach to Model-Free Policy Evaluation." International Conference on Machine Learning, 2008. doi:10.1145/1390156.1390291

Markdown

[Ueno et al. "A Semiparametric Statistical Approach to Model-Free Policy Evaluation." International Conference on Machine Learning, 2008.](https://mlanthology.org/icml/2008/ueno2008icml-semiparametric/) doi:10.1145/1390156.1390291

BibTeX

@inproceedings{ueno2008icml-semiparametric,
  title     = {{A Semiparametric Statistical Approach to Model-Free Policy Evaluation}},
  author    = {Ueno, Tsuyoshi and Kawanabe, Motoaki and Mori, Takeshi and Maeda, Shin-ichi and Ishii, Shin},
  booktitle = {International Conference on Machine Learning},
  year      = {2008},
  pages     = {1072-1079},
  doi       = {10.1145/1390156.1390291},
  url       = {https://mlanthology.org/icml/2008/ueno2008icml-semiparametric/}
}