On Well-Posedness and Minimax Optimal Rates of Nonparametric Q-Function Estimation in Off-Policy Evaluation

Abstract

We study the off-policy evaluation (OPE) problem in an infinite-horizon Markov decision process with continuous states and actions. We recast the $Q$-function estimation into a special form of the nonparametric instrumental variables (NPIV) estimation problem. We first show that under one mild condition the NPIV formulation of $Q$-function estimation is well-posed in the sense of $L^2$-measure of ill-posedness with respect to the data generating distribution, bypassing a strong assumption on the discount factor $\gamma$ imposed in the recent literature for obtaining the $L^2$ convergence rates of various $Q$-function estimators. Thanks to this new well-posed property, we derive the first minimax lower bounds for the convergence rates of nonparametric estimation of $Q$-function and its derivatives in both sup-norm and $L^2$-norm, which are shown to be the same as those for the classical nonparametric regression (Stone, 1982). We then propose a sieve two-stage least squares estimator and establish its rate-optimality in both norms under some mild conditions. Our general results on the well-posedness and the minimax lower bounds are of independent interest to study not only other nonparametric estimators for $Q$-function but also efficient estimation on the value of any target policy in off-policy settings.

Cite

Text

Chen and Qi. "On Well-Posedness and Minimax Optimal Rates of Nonparametric Q-Function Estimation in Off-Policy Evaluation." International Conference on Machine Learning, 2022.

Markdown

[Chen and Qi. "On Well-Posedness and Minimax Optimal Rates of Nonparametric Q-Function Estimation in Off-Policy Evaluation." International Conference on Machine Learning, 2022.](https://mlanthology.org/icml/2022/chen2022icml-wellposedness/)

BibTeX

@inproceedings{chen2022icml-wellposedness,
  title     = {{On Well-Posedness and Minimax Optimal Rates of Nonparametric Q-Function Estimation in Off-Policy Evaluation}},
  author    = {Chen, Xiaohong and Qi, Zhengling},
  booktitle = {International Conference on Machine Learning},
  year      = {2022},
  pages     = {3558-3582},
  volume    = {162},
  url       = {https://mlanthology.org/icml/2022/chen2022icml-wellposedness/}
}