Policy Evaluation Using the Ω-Return

Philip S. Thomas, Scott Niekum, Georgios Theocharous, George Konidaris

NeurIPS 2015 pp. 334-342

/neurips/2015/thomas2015neurips-policy/

Abstract

We propose the Ω-return as an alternative to the λ-return currently used by the TD(λ) family of algorithms. The benefit of the Ω-return is that it accounts for the correlation of different length returns. Because it is difficult to compute exactly, we suggest one way of approximating the Ω-return. We provide empirical studies that suggest that it is superior to the λ-return and γ-return for a variety of problems.

PDF NeurIPS Semantic Scholar

Cite

Text

Thomas et al. "Policy Evaluation Using the Ω-Return." Neural Information Processing Systems, 2015.

Markdown

[Thomas et al. "Policy Evaluation Using the Ω-Return." Neural Information Processing Systems, 2015.](https://mlanthology.org/neurips/2015/thomas2015neurips-policy/)

BibTeX

@inproceedings{thomas2015neurips-policy,
  title     = {{Policy Evaluation Using the Ω-Return}},
  author    = {Thomas, Philip S. and Niekum, Scott and Theocharous, Georgios and Konidaris, George},
  booktitle = {Neural Information Processing Systems},
  year      = {2015},
  pages     = {334-342},
  url       = {https://mlanthology.org/neurips/2015/thomas2015neurips-policy/}
}