Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations

Abstract

Reinforcement learning (RL) algorithms combined with modern function approximators such as kernel functions and deep neural networks have achieved significant empirical successes in large-scale application problems with a massive number of states. From a theoretical perspective, however, RL with functional approximation poses a fundamental challenge to developing algorithms with provable computational and statistical efficiency, due to the need to take into consideration both the exploration-exploitation tradeoff that is inherent in RL and the bias-variance tradeoff that is innate in statistical estimation. To address such a challenge, focusing on the episodic setting where the action-value functions are represented by a kernel function or over-parametrized neural network, we propose the first provable RL algorithm with both polynomial runtime and sample complexity, without additional assumptions on the data-generating model. In particular, for both the kernel and neural settings, we prove that an optimistic modification of the least-squares value iteration algorithm incurs an $\tilde{\mathcal{O}}(\delta_{\cF} H^2 \sqrt{T})$ regret, where $\delta_{\cF}$ characterizes the intrinsic complexity of the function class $\cF$, $H$ is the length of each episode, and $T$ is the total number of episodes. Our regret bounds are independent of the number of states and therefore even allows it to diverge, which exhibits the benefit of function approximation.

Cite

Text

Yang et al. "Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations." Neural Information Processing Systems, 2020.

Markdown

[Yang et al. "Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/yang2020neurips-provably/)

BibTeX

@inproceedings{yang2020neurips-provably,
  title     = {{Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations}},
  author    = {Yang, Zhuoran and Jin, Chi and Wang, Zhaoran and Wang, Mengdi and Jordan, Michael I.},
  booktitle = {Neural Information Processing Systems},
  year      = {2020},
  url       = {https://mlanthology.org/neurips/2020/yang2020neurips-provably/}
}