Autonomous Exploration for Navigating in MDPs Using Blackbox RL Algorithms
Abstract
We consider the problem of navigating in a Markov decision process where extrinsic rewards are either absent or ignored. In this setting, the objective is to learn policies to reach all the states that are reachable within a given number of steps (in expectation) from a starting state. We introduce a novel meta-algorithm which can use any online reinforcement learning algorithm (with appropriate regret guarantees) as a black-box. Our algorithm demonstrates a method for transforming the output of online algorithms to a batch setting. We prove an upper bound on the sample complexity of our algorithm in terms of the regret bound of the used black-box RL algorithm. Furthermore, we provide experimental results to validate the effectiveness of our algorithm and correctness of our theoretical results.
Cite
Text
Gajane et al. "Autonomous Exploration for Navigating in MDPs Using Blackbox RL Algorithms." International Joint Conference on Artificial Intelligence, 2023. doi:10.24963/IJCAI.2023/413Markdown
[Gajane et al. "Autonomous Exploration for Navigating in MDPs Using Blackbox RL Algorithms." International Joint Conference on Artificial Intelligence, 2023.](https://mlanthology.org/ijcai/2023/gajane2023ijcai-autonomous/) doi:10.24963/IJCAI.2023/413BibTeX
@inproceedings{gajane2023ijcai-autonomous,
title = {{Autonomous Exploration for Navigating in MDPs Using Blackbox RL Algorithms}},
author = {Gajane, Pratik and Auer, Peter and Ortner, Ronald},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2023},
pages = {3714-3722},
doi = {10.24963/IJCAI.2023/413},
url = {https://mlanthology.org/ijcai/2023/gajane2023ijcai-autonomous/}
}