Is Plug-in Solver Sample-Efficient for Feature-Based Reinforcement Learning?

Abstract

It is believed that a model-based approach for reinforcement learning (RL) is the key to reduce sample complexity. However, the understanding of the sample optimality of model-based RL is still largely missing, even for the linear case. This work considers sample complexity of finding an $\epsilon$-optimal policy in a Markov decision process (MDP) that admits a linear additive feature representation, given only access to a generative model. We solve this problem via a plug-in solver approach, which builds an empirical model and plans in this empirical model via an arbitrary plug-in solver. We prove that under the anchor-state assumption, which implies implicit non-negativity in the feature space, the minimax sample complexity of finding an $\epsilon$-optimal policy in a $\gamma$-discounted MDP is $O(K/(1-\gamma)^3\epsilon^2)$, which only depends on the dimensionality $K$ of the feature space and has no dependence on the state or action space. We further extend our results to a relaxed setting where anchor-states may not exist and show that a plug-in approach can be sample efficient as well, providing a flexible approach to design model-based algorithms for RL.

Cite

Text

Cui and Yang. "Is Plug-in Solver Sample-Efficient for Feature-Based Reinforcement Learning?." Neural Information Processing Systems, 2020.

Markdown

[Cui and Yang. "Is Plug-in Solver Sample-Efficient for Feature-Based Reinforcement Learning?." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/cui2020neurips-plugin/)

BibTeX

@inproceedings{cui2020neurips-plugin,
  title     = {{Is Plug-in Solver Sample-Efficient for Feature-Based Reinforcement Learning?}},
  author    = {Cui, Qiwen and Yang, Lin},
  booktitle = {Neural Information Processing Systems},
  year      = {2020},
  url       = {https://mlanthology.org/neurips/2020/cui2020neurips-plugin/}
}