Autonomous Exploration for Navigating in MDPs Using Blackbox RL Algorithms

Abstract

We consider the problem of navigating in a Markov decision process where extrinsic rewards are either absent or ignored. In this setting, the objective is to learn policies to reach all the states that are reachable within a given number of steps (in expectation) from a starting state. We introduce a novel meta-algorithm which can use any online reinforcement learning algorithm (with appropriate regret guarantees) as a black-box. Our algorithm demonstrates a method for transforming the output of online algorithms to a batch setting. We prove an upper bound on the sample complexity of our algorithm in terms of the regret bound of the used black-box RL algorithm. Furthermore, we provide experimental results to validate the effectiveness of our algorithm and correctness of our theoretical results.

Cite

Text

Gajane et al. "Autonomous Exploration for Navigating in MDPs Using Blackbox RL Algorithms." International Joint Conference on Artificial Intelligence, 2023. doi:10.24963/IJCAI.2023/413

Markdown

[Gajane et al. "Autonomous Exploration for Navigating in MDPs Using Blackbox RL Algorithms." International Joint Conference on Artificial Intelligence, 2023.](https://mlanthology.org/ijcai/2023/gajane2023ijcai-autonomous/) doi:10.24963/IJCAI.2023/413

BibTeX

@inproceedings{gajane2023ijcai-autonomous,
  title     = {{Autonomous Exploration for Navigating in MDPs Using Blackbox RL Algorithms}},
  author    = {Gajane, Pratik and Auer, Peter and Ortner, Ronald},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2023},
  pages     = {3714-3722},
  doi       = {10.24963/IJCAI.2023/413},
  url       = {https://mlanthology.org/ijcai/2023/gajane2023ijcai-autonomous/}
}