Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping

Zhou, Dongruo; He, Jiafan; Gu, Quanquan

Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping

ICML 2021 pp. 12793-12802

/icml/2021/zhou2021icml-provably/

Abstract

Modern tasks in reinforcement learning have large state and action spaces. To deal with them efficiently, one often uses predefined feature mapping to represent states and actions in a low dimensional space. In this paper, we study reinforcement learning for discounted Markov Decision Processes (MDPs), where the transition kernel can be parameterized as a linear function of certain feature mapping. We propose a novel algorithm which makes use of the feature mapping and obtains a $\tilde O(d\sqrt{T}/(1-\gamma)^2)$ regret, where $d$ is the dimension of the feature space, $T$ is the time horizon and $\gamma$ is the discount factor of the MDP. To the best of our knowledge, this is the first polynomial regret bound without accessing a generative model or making strong assumptions such as ergodicity of the MDP. By constructing a special class of MDPs, we also show that for any algorithms, the regret is lower bounded by $\Omega(d\sqrt{T}/(1-\gamma)^{1.5})$. Our upper and lower bound results together suggest that the proposed reinforcement learning algorithm is near-optimal up to a $(1-\gamma)^{-0.5}$ factor.

PDF ICML Semantic Scholar

Cite

Text

Zhou et al. "Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping." International Conference on Machine Learning, 2021.

Markdown

[Zhou et al. "Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping." International Conference on Machine Learning, 2021.](https://mlanthology.org/icml/2021/zhou2021icml-provably/)

BibTeX

@inproceedings{zhou2021icml-provably,
  title     = {{Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping}},
  author    = {Zhou, Dongruo and He, Jiafan and Gu, Quanquan},
  booktitle = {International Conference on Machine Learning},
  year      = {2021},
  pages     = {12793-12802},
  volume    = {139},
  url       = {https://mlanthology.org/icml/2021/zhou2021icml-provably/}
}