A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning
Abstract
With the increasing need for handling large state and action spaces, general function approximation has become a key technique in reinforcement learning problems. In this paper, we propose a unified framework that integrates both model-based and model-free reinforcement learning and subsumes nearly all Markov decision process (MDP) models in the existing literature for tractable RL. We propose a novel estimation function with decomposable structural properties for optimization-based exploration and use the functional Eluder dimension with respect to an admissible Bellman characterization function as a complexity measure of the model class. Under our framework, a new sample-efficient algorithm namely OPtimization-based ExploRation with Approximation (OPERA) is proposed, achieving regret bounds that match or improve over the best-known results for a variety of MDP models. In particular, for MDPs with low Witness rank, under a slightly stronger assumption, OPERA improves the state-of-the-art sample complexity results by a factor of $dH$. Our framework provides a generic interface to study and design new RL models and algorithms.
Cite
Text
Chen et al. "A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning." NeurIPS 2022 Workshops: DeepRL, 2022.Markdown
[Chen et al. "A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning." NeurIPS 2022 Workshops: DeepRL, 2022.](https://mlanthology.org/neuripsw/2022/chen2022neuripsw-general/)BibTeX
@inproceedings{chen2022neuripsw-general,
title = {{A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning}},
author = {Chen, Zixiang and Li, Chris Junchi and Yuan, Angela and Gu, Quanquan and Jordan, Michael},
booktitle = {NeurIPS 2022 Workshops: DeepRL},
year = {2022},
url = {https://mlanthology.org/neuripsw/2022/chen2022neuripsw-general/}
}