Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation

Josiah P. Hanna, Peter Stone, Scott Niekum

AAAI 2017 pp. 4933-4934

doi:10.1609/AAAI.V31I1.11123 /aaai/2017/hanna2017aaai-bootstrapping/

Abstract

In many reinforcement learning applications, it is desirable to determine confidence interval lower bounds on the performance of any given policy without executing said policy. In this context, we propose two bootstrapping off-policy evaluation methods which use learned MDP transition models in order to estimate lower confidence bounds on policy performance with limited data. We empirically evaluate the proposed methods in a standard policy evaluation tasks.

PDF AAAI Semantic Scholar

Cite

Text

Hanna et al. "Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation." AAAI Conference on Artificial Intelligence, 2017. doi:10.1609/AAAI.V31I1.11123

Markdown

[Hanna et al. "Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation." AAAI Conference on Artificial Intelligence, 2017.](https://mlanthology.org/aaai/2017/hanna2017aaai-bootstrapping/) doi:10.1609/AAAI.V31I1.11123

BibTeX

@inproceedings{hanna2017aaai-bootstrapping,
  title     = {{Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation}},
  author    = {Hanna, Josiah P. and Stone, Peter and Niekum, Scott},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2017},
  pages     = {4933-4934},
  doi       = {10.1609/AAAI.V31I1.11123},
  url       = {https://mlanthology.org/aaai/2017/hanna2017aaai-bootstrapping/}
}