Policy Evaluation with Temporal Differences: A Survey and Comparison

Abstract

Policy evaluation is an essential step in most reinforcement learning approaches. It yields a value function, the quality assessment of states for a given policy, which can be used in a policy improvement step. Since the late 1980s, this research area has been dominated by temporal-difference (TD) methods due to their data-efficiency. However, core issues such as stability guarantees in the off-policy scenario, improved sample efficiency and probabilistic treatment of the uncertainty in the estimates have only been tackled recently, which has led to a large number of new approaches.

Cite

Text

Dann et al. "Policy Evaluation with Temporal Differences: A Survey and Comparison." Journal of Machine Learning Research, 2014.

Markdown

[Dann et al. "Policy Evaluation with Temporal Differences: A Survey and Comparison." Journal of Machine Learning Research, 2014.](https://mlanthology.org/jmlr/2014/dann2014jmlr-policy/)

BibTeX

@article{dann2014jmlr-policy,
  title     = {{Policy Evaluation with Temporal Differences: A Survey and Comparison}},
  author    = {Dann, Christoph and Neumann, Gerhard and Peters, Jan},
  journal   = {Journal of Machine Learning Research},
  year      = {2014},
  pages     = {809-883},
  volume    = {15},
  url       = {https://mlanthology.org/jmlr/2014/dann2014jmlr-policy/}
}