When Is Agnostic Reinforcement Learning Statistically Tractable?

Abstract

We study the problem of agnostic PAC reinforcement learning (RL): given a policy class $\Pi$, how many rounds of interaction with an unknown MDP (with a potentially large state and action space) are required to learn an $\epsilon$-suboptimal policy with respect to (\Pi)? Towards that end, we introduce a new complexity measure, called the spanning capacity, that depends solely on the set (\Pi) and is independent of the MDP dynamics. With a generative model, we show that the spanning capacity characterizes PAC learnability for every policy class $\Pi$. However, for online RL, the situation is more subtle. We show there exists a policy class $\Pi$ with a bounded spanning capacity that requires a superpolynomial number of samples to learn. This reveals a surprising separation for agnostic learnability between generative access and online access models (as well as between deterministic/stochastic MDPs under online access). On the positive side, we identify an additional sunflower structure which in conjunction with bounded spanning capacity enables statistically efficient online RL via a new algorithm called POPLER, which takes inspiration from classical importance sampling methods as well as recent developments for reachable-state identification and policy evaluation in reward-free exploration.

Cite

Text

Li et al. "When Is Agnostic Reinforcement Learning Statistically Tractable?." ICML 2023 Workshops: Frontiers4LCD, 2023.

Markdown

[Li et al. "When Is Agnostic Reinforcement Learning Statistically Tractable?." ICML 2023 Workshops: Frontiers4LCD, 2023.](https://mlanthology.org/icmlw/2023/li2023icmlw-agnostic/)

BibTeX

@inproceedings{li2023icmlw-agnostic,
  title     = {{When Is Agnostic Reinforcement Learning Statistically Tractable?}},
  author    = {Li, Gene and Jia, Zeyu and Rakhlin, Alexander and Sekhari, Ayush and Srebro, Nathan},
  booktitle = {ICML 2023 Workshops: Frontiers4LCD},
  year      = {2023},
  url       = {https://mlanthology.org/icmlw/2023/li2023icmlw-agnostic/}
}