Efficient Inference and Exploration for Reinforcement Learning
Abstract
Despite an ever growing literature on reinforcement learning algorithms and applications, much less is known about their statistical inference. In this paper, we investigate the large-sample behaviors of the Q-value estimates with closed-form characterizations of the asymptotic variances. This allows us to efficiently construct confidence regions for Q-value and optimal value functions, and to develop policies to minimize their estimation errors. This also leads to a policy exploration strategy that relies on estimating the relative discrepancies among the Q estimates. Numerical experiments show superior performances of our exploration strategy than other benchmark approaches.
Cite
Text
Zhu et al. "Efficient Inference and Exploration for Reinforcement Learning." International Conference on Learning Representations, 2020.Markdown
[Zhu et al. "Efficient Inference and Exploration for Reinforcement Learning." International Conference on Learning Representations, 2020.](https://mlanthology.org/iclr/2020/zhu2020iclr-efficient/)BibTeX
@inproceedings{zhu2020iclr-efficient,
title = {{Efficient Inference and Exploration for Reinforcement Learning}},
author = {Zhu, Yi and Dong, Jing and Lam, Henry},
booktitle = {International Conference on Learning Representations},
year = {2020},
url = {https://mlanthology.org/iclr/2020/zhu2020iclr-efficient/}
}