Model Selection in Reinforcement Learning

Abstract

We consider the problem of model selection in the batch (offline, non-interactive) reinforcement learning setting when the goal is to find an action-value function with the smallest Bellman error among a countable set of candidates functions. We propose a complexity regularization-based model selection algorithm, $\ensuremath{\mbox{\textsc {BErMin}}}$ , and prove that it enjoys an oracle-like property: the estimator’s error differs from that of an oracle, who selects the candidate with the minimum Bellman error, by only a constant factor and a small remainder term that vanishes at a parametric rate as the number of samples increases. As an application, we consider a problem when the true action-value function belongs to an unknown member of a nested sequence of function spaces. We show that under some additional technical conditions $\ensuremath{\mbox{\textsc {BErMin}}}$ leads to a procedure whose rate of convergence, up to a constant factor, matches that of an oracle who knows which of the nested function spaces the true action-value function belongs to, i.e., the procedure achieves adaptivity .

Cite

Text

Farahmand and Szepesvári. "Model Selection in Reinforcement Learning." Machine Learning, 2011. doi:10.1007/S10994-011-5254-7

Markdown

[Farahmand and Szepesvári. "Model Selection in Reinforcement Learning." Machine Learning, 2011.](https://mlanthology.org/mlj/2011/farahmand2011mlj-model/) doi:10.1007/S10994-011-5254-7

BibTeX

@article{farahmand2011mlj-model,
  title     = {{Model Selection in Reinforcement Learning}},
  author    = {Farahmand, Amir Massoud and Szepesvári, Csaba},
  journal   = {Machine Learning},
  year      = {2011},
  pages     = {299-332},
  doi       = {10.1007/S10994-011-5254-7},
  volume    = {85},
  url       = {https://mlanthology.org/mlj/2011/farahmand2011mlj-model/}
}