Model-Free Reinforcement Learning with the Decision-Estimation Coefficient
Abstract
We consider the problem of interactive decision making, encompassing structured bandits and reinforcementlearning with general function approximation. Recently, Foster et al. (2021) introduced theDecision-Estimation Coefficient, a measure of statistical complexity that lower bounds the optimal regret for interactive decisionmaking, as well as a meta-algorithm, Estimation-to-Decisions, which achieves upperbounds in terms of the same quantity. Estimation-to-Decisions is a reduction, which liftsalgorithms for (supervised) online estimation into algorithms fordecision making. In this paper, we show that by combining Estimation-to-Decisions witha specialized form of "optimistic" estimation introduced byZhang (2022), it is possible to obtain guaranteesthat improve upon those of Foster et al. (2021) byaccommodating more lenient notions of estimation error. We use this approach to derive regret bounds formodel-free reinforcement learning with value function approximation, and give structural results showing when it can and cannot help more generally.
Cite
Text
Foster et al. "Model-Free Reinforcement Learning with the Decision-Estimation Coefficient." Neural Information Processing Systems, 2023.Markdown
[Foster et al. "Model-Free Reinforcement Learning with the Decision-Estimation Coefficient." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/foster2023neurips-modelfree/)BibTeX
@inproceedings{foster2023neurips-modelfree,
title = {{Model-Free Reinforcement Learning with the Decision-Estimation Coefficient}},
author = {Foster, Dylan J and Golowich, Noah and Qian, Jian and Rakhlin, Alexander and Sekhari, Ayush},
booktitle = {Neural Information Processing Systems},
year = {2023},
url = {https://mlanthology.org/neurips/2023/foster2023neurips-modelfree/}
}