Instance-Dependent Fixed-Budget Pure Exploration in Reinforcement Learning

Abstract

We study the problem of fixed budget pure exploration in reinforcement learning. The goal is to identify a near-optimal policy, given a fixed budget on the number of interactions with the environment. Unlike the standard PAC setting, we do not require the target error level $\epsilon$ and failure rate $\delta$ as input. We propose novel algorithms and provide, to the best of our knowledge, the first instance-dependent $\epsilon$-uniform guarantee, meaning that the probability that $\epsilon$-correctness is ensured can be obtained simultaneously for all $\epsilon$ above a budget-dependent threshold. It characterizes the budget requirements in terms of the problem-specific hardness of exploration. As a core component of our analysis, we derive a $\epsilon$-uniform guarantee for the multiple bandit problem—solving multiple multi-armed bandit instances simultaneously—which may be of independent interest. To enable our analysis, we also develop tools for reward-free exploration under the fixed-budget setting, which we believe will be useful for future work.

Cite

Text

Kim et al. "Instance-Dependent Fixed-Budget Pure Exploration in Reinforcement Learning." International Conference on Learning Representations, 2026.

Markdown

[Kim et al. "Instance-Dependent Fixed-Budget Pure Exploration in Reinforcement Learning." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/kim2026iclr-instancedependent/)

BibTeX

@inproceedings{kim2026iclr-instancedependent,
  title     = {{Instance-Dependent Fixed-Budget Pure Exploration in Reinforcement Learning}},
  author    = {Kim, Yeongjong and Kim, Yeoneung and Jun, Kwang-Sung},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/kim2026iclr-instancedependent/}
}