Agnostic $q$-Learning with Function Approximation in Deterministic Systems: Near-Optimal Bounds on Approximation Error and Sample Complexity

Du, Simon S; Lee, Jason; Mahajan, Gaurav; Wang, Ruosong

Agnostic $q$-Learning with Function Approximation in Deterministic Systems: Near-Optimal Bounds on Approximation Error and Sample Complexity

Simon S Du, Jason Lee, Gaurav Mahajan, Ruosong Wang

NeurIPS 2020

/neurips/2020/du2020neurips-agnostic/

Abstract

The current paper studies the problem of agnostic $Q$-learning with function approximation in deterministic systems where the optimal $Q$-function is approximable by a function in the class $\mathcal{F}$ with approximation error $\delta \ge 0$. We propose a novel recursion-based algorithm and show that if $\delta = O\left(\rho/\sqrt{\dim_E}\right)$, then one can find the optimal policy using $O(\dim_E)$ trajectories, where $\rho$ is the gap between the optimal $Q$-value of the best actions and that of the second-best actions and $\dim_E$ is the Eluder dimension of $\mathcal{F}$. Our result has two implications: \begin{enumerate} \it{em} In conjunction with the lower bound in [Du et al., 2020], our upper bound suggests that the condition $\delta = \widetilde{\Theta}\left(\rho/\sqrt{\dim_E}\right)$ is necessary and sufficient for algorithms with polynomial sample complexity. \it{em} In conjunction with the obvious lower bound in the tabular case, our upper bound suggests that the sample complexity $\widetilde{\Theta}\left(\dim_E\right)$ is tight in the agnostic setting. \end{enumerate} Therefore, we help address the open problem on agnostic $Q$-learning proposed in [Wen and Van Roy, 2013]. We further extend our algorithm to the stochastic reward setting and obtain similar results.

PDF NeurIPS Semantic Scholar

Cite

Text

Du et al. "Agnostic $q$-Learning with Function Approximation in Deterministic Systems: Near-Optimal Bounds on Approximation Error and Sample Complexity." Neural Information Processing Systems, 2020.

Markdown

[Du et al. "Agnostic $q$-Learning with Function Approximation in Deterministic Systems: Near-Optimal Bounds on Approximation Error and Sample Complexity." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/du2020neurips-agnostic/)

BibTeX

@inproceedings{du2020neurips-agnostic,
  title     = {{Agnostic $q$-Learning with Function Approximation in Deterministic Systems: Near-Optimal Bounds on Approximation Error and Sample Complexity}},
  author    = {Du, Simon S and Lee, Jason and Mahajan, Gaurav and Wang, Ruosong},
  booktitle = {Neural Information Processing Systems},
  year      = {2020},
  url       = {https://mlanthology.org/neurips/2020/du2020neurips-agnostic/}
}