Optimistic PAC Reinforcement Learning: The Instance-Dependent View

Andrea Tirinzoni, Aymen Al-Marjani, Emilie Kaufmann

ALT 2023 pp. 1460-1480

/alt/2023/tirinzoni2023alt-optimistic/

Abstract

Optimistic algorithms have been extensively studied for regret minimization in episodic tabular Markov Decision Processes (MDPs), both from a minimax and an instance-dependent view. However, for the PAC RL problem, where the goal is to identify a near-optimal policy with high probability, little is known about their instance-dependent sample complexity. A negative result of Wagenmaker et al. (2022) suggests that optimistic sampling rules cannot be used to attain the (still elusive) optimal instance-dependent sample complexity. On the positive side, we provide the first instance-dependent bound for an optimistic algorithm for PAC RL, BPI-UCRL, for which only minimax guarantees were available (Kaufmann et al., 2021). While our bound features some minimal visitation probabilities, it also features a refined notion of sub-optimality gap compared to the value gaps that appear in prior work. Moreover, in MDPs with deterministic transitions, we show that BPI-UCRL is actually near instance-optimal (up to a factor of the horizon). On the technical side, our analysis is very simple thanks to a new “target trick” of independent interest. We complement these findings with a novel hardness result explaining why the instance-dependent complexity of PAC RL cannot be easily related to that of regret minimization, unlike in the minimax regime.

PDF ALT Semantic Scholar

Cite

Text

Tirinzoni et al. "Optimistic PAC Reinforcement Learning: The Instance-Dependent View." Proceedings of The 34th International Conference on Algorithmic Learning Theory, 2023.

Markdown

[Tirinzoni et al. "Optimistic PAC Reinforcement Learning: The Instance-Dependent View." Proceedings of The 34th International Conference on Algorithmic Learning Theory, 2023.](https://mlanthology.org/alt/2023/tirinzoni2023alt-optimistic/)

BibTeX

@inproceedings{tirinzoni2023alt-optimistic,
  title     = {{Optimistic PAC Reinforcement Learning: The Instance-Dependent View}},
  author    = {Tirinzoni, Andrea and Al-Marjani, Aymen and Kaufmann, Emilie},
  booktitle = {Proceedings of The 34th International Conference on Algorithmic Learning Theory},
  year      = {2023},
  pages     = {1460-1480},
  volume    = {201},
  url       = {https://mlanthology.org/alt/2023/tirinzoni2023alt-optimistic/}
}