Improving PAC Exploration Using the Median of Means
Abstract
We present the first application of the median of means in a PAC exploration algorithm for MDPs. Using the median of means allows us to significantly reduce the dependence of our bounds on the range of values that the value function can take, while introducing a dependence on the (potentially much smaller) variance of the Bellman operator. Additionally, our algorithm is the first algorithm with PAC bounds that can be applied to MDPs with unbounded rewards.
Cite
Text
Pazis et al. "Improving PAC Exploration Using the Median of Means." Neural Information Processing Systems, 2016.Markdown
[Pazis et al. "Improving PAC Exploration Using the Median of Means." Neural Information Processing Systems, 2016.](https://mlanthology.org/neurips/2016/pazis2016neurips-improving/)BibTeX
@inproceedings{pazis2016neurips-improving,
title = {{Improving PAC Exploration Using the Median of Means}},
author = {Pazis, Jason and Parr, Ronald E and How, Jonathan P},
booktitle = {Neural Information Processing Systems},
year = {2016},
pages = {3898-3906},
url = {https://mlanthology.org/neurips/2016/pazis2016neurips-improving/}
}