Provably Efficient Representation Selection in Low-Rank Markov Decision Processes: From Online to Offline RL

Abstract

The success of deep reinforcement learning (DRL) lies in its ability to learn a representation that is well-suited for the exploration and exploitation task. To understand how the choice of representation can improve the efficiency of reinforcement learning (RL), we study representation selection for a class of low-rank Markov Decision Processes (MDPs) where the transition kernel can be represented in a bilinear form. We propose an efficient algorithm, called ReLEX, for representation learning in both online and offline RL. Specifically, we show that the online version of ReLEX, called ReLEX-UCB, always performs no worse than the state-of-the-art algorithm without representation selection, and achieves a strictly better constant regret if the representation function class has a "coverage" property over the entire state-action space. For the offline counterpart, ReLEX-LCB, we show that the algorithm can find the optimal policy if the representation class can cover the state-action space and achieves gap-dependent sample complexity. This is the first result with constant sample complexity for representation learning in offline RL.

Cite

Text

Zhang et al. "Provably Efficient Representation Selection in Low-Rank Markov Decision Processes: From Online to Offline RL." Uncertainty in Artificial Intelligence, 2023.

Markdown

[Zhang et al. "Provably Efficient Representation Selection in Low-Rank Markov Decision Processes: From Online to Offline RL." Uncertainty in Artificial Intelligence, 2023.](https://mlanthology.org/uai/2023/zhang2023uai-provably/)

BibTeX

@inproceedings{zhang2023uai-provably,
  title     = {{Provably Efficient Representation Selection in Low-Rank Markov Decision Processes: From Online to Offline RL}},
  author    = {Zhang, W. and He, J. and Zhou, D. and Gu, Q. and Zhang, A.},
  booktitle = {Uncertainty in Artificial Intelligence},
  year      = {2023},
  pages     = {2488-2497},
  volume    = {216},
  url       = {https://mlanthology.org/uai/2023/zhang2023uai-provably/}
}