Model-Free Active Exploration in Reinforcement Learning
Abstract
We study the problem of exploration in Reinforcement Learning and present a novel model-free solution. We adopt an information-theoretical viewpoint and start from the instance-specific lower bound of the number of samples that have to be collected to identify a nearly-optimal policy. Deriving this lower bound along with the optimal exploration strategy entails solving an intricate optimization problem and requires a model of the system. In turn, most existing sample optimal exploration algorithms rely on estimating the model. We derive an approximation of the instance-specific lower bound that only involves quantities that can be inferred using model-free approaches. Leveraging this approximation, we devise an ensemble-based model-free exploration strategy applicable to both tabular and continuous Markov decision processes. Numerical results demonstrate that our strategy is able to identify efficient policies faster than state-of-the-art exploration approaches.
Cite
Text
Russo and Proutiere. "Model-Free Active Exploration in Reinforcement Learning." Neural Information Processing Systems, 2023.Markdown
[Russo and Proutiere. "Model-Free Active Exploration in Reinforcement Learning." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/russo2023neurips-modelfree/)BibTeX
@inproceedings{russo2023neurips-modelfree,
title = {{Model-Free Active Exploration in Reinforcement Learning}},
author = {Russo, Alessio and Proutiere, Alexandre},
booktitle = {Neural Information Processing Systems},
year = {2023},
url = {https://mlanthology.org/neurips/2023/russo2023neurips-modelfree/}
}