Semiparametrically Efficient Off-Policy Evaluation in Linear Markov Decision Processes

Abstract

We study semiparametrically efficient estimation in off-policy evaluation (OPE) where the underlying Markov decision process (MDP) is linear with a known feature map. We characterize the variance lower bound for regular estimators in the linear MDP setting and propose an efficient estimator whose variance achieves that lower bound. Consistency and asymptotic normality of our estimator are established under mild conditions, which merely requires the only infinite-dimensional nuisance parameter to be estimated at a $n^{-1/4}$ convergence rate. We also construct an asymptotically valid confidence interval for statistical inference and conduct simulation studies to validate our results. To our knowledge, this is the first work that concerns efficient estimation in the presence of a known structure of MDPs in the OPE literature.

Cite

Text

Xie et al. "Semiparametrically Efficient Off-Policy Evaluation in Linear Markov Decision Processes." International Conference on Machine Learning, 2023.

Markdown

[Xie et al. "Semiparametrically Efficient Off-Policy Evaluation in Linear Markov Decision Processes." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/xie2023icml-semiparametrically/)

BibTeX

@inproceedings{xie2023icml-semiparametrically,
  title     = {{Semiparametrically Efficient Off-Policy Evaluation in Linear Markov Decision Processes}},
  author    = {Xie, Chuhan and Yang, Wenhao and Zhang, Zhihua},
  booktitle = {International Conference on Machine Learning},
  year      = {2023},
  pages     = {38227-38257},
  volume    = {202},
  url       = {https://mlanthology.org/icml/2023/xie2023icml-semiparametrically/}
}