Policy Evaluation with Temporal Differences: A Survey and Comparison
Abstract
Policy evaluation is an essential step in most reinforcement learning approaches. It yields a value function, the quality assessment of states for a given policy, which can be used in a policy improvement step. Since the late 1980s, this research area has been dominated by temporal-difference (TD) methods due to their data-efficiency. However, core issues such as stability guarantees in the off-policy scenario, improved sample efficiency and probabilistic treatment of the uncertainty in the estimates have only been tackled recently, which has led to a large number of new approaches.
Cite
Text
Dann et al. "Policy Evaluation with Temporal Differences: A Survey and Comparison." Journal of Machine Learning Research, 2014.Markdown
[Dann et al. "Policy Evaluation with Temporal Differences: A Survey and Comparison." Journal of Machine Learning Research, 2014.](https://mlanthology.org/jmlr/2014/dann2014jmlr-policy/)BibTeX
@article{dann2014jmlr-policy,
title = {{Policy Evaluation with Temporal Differences: A Survey and Comparison}},
author = {Dann, Christoph and Neumann, Gerhard and Peters, Jan},
journal = {Journal of Machine Learning Research},
year = {2014},
pages = {809-883},
volume = {15},
url = {https://mlanthology.org/jmlr/2014/dann2014jmlr-policy/}
}