Model Selection in Reinforcement Learning with General Function Approximations
Abstract
We consider model selection for classic Reinforcement Learning (RL) environments – Multi Armed Bandits (MABs) and Markov Decision Processes (MDPs) – under general function approximations. In the model selection framework, we do not know the function classes, denoted by $\mathcal {F}$ F and $\mathcal {M}$ M , where the true models – reward generating function for MABs and transition kernel for MDPs – lie, respectively. Instead, we are given M nested function (hypothesis) classes such that true models are contained in at-least one such class. In this paper, we propose and analyze efficient model selection algorithms for MABs and MDPs, that adapt to the smallest function class (among the nested M classes) containing the true underlying model. Under a separability assumption on the nested hypothesis classes, we show that the cumulative regret of our adaptive algorithms match to that of an oracle which knows the correct function classes (i.e., $\mathcal {F}$ F and $\mathcal {M}$ M ) a priori. Furthermore, for both the settings, we show that the cost of model selection is an additive term in the regret having weak (logarithmic) dependence on the learning horizon T .
Cite
Text
Ghosh and Chowdhury. "Model Selection in Reinforcement Learning with General Function Approximations." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2022. doi:10.1007/978-3-031-26412-2_10Markdown
[Ghosh and Chowdhury. "Model Selection in Reinforcement Learning with General Function Approximations." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2022.](https://mlanthology.org/ecmlpkdd/2022/ghosh2022ecmlpkdd-model/) doi:10.1007/978-3-031-26412-2_10BibTeX
@inproceedings{ghosh2022ecmlpkdd-model,
title = {{Model Selection in Reinforcement Learning with General Function Approximations}},
author = {Ghosh, Avishek and Chowdhury, Sayak Ray},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2022},
pages = {148-164},
doi = {10.1007/978-3-031-26412-2_10},
url = {https://mlanthology.org/ecmlpkdd/2022/ghosh2022ecmlpkdd-model/}
}