Optimism in Reinforcement Learning with Generalized Linear Function Approximation

Abstract

We design a new provably efficient algorithm for episodic reinforcement learning with generalized linear function approximation. We analyze the algorithm under a new expressivity assumption that we call ``optimistic closure,'' which is strictly weaker than assumptions from prior analyses for the linear setting. With optimistic closure, we prove that our algorithm enjoys a regret bound of $\widetilde{O}\left(H\sqrt{d^3 T}\right)$ where $H$ is the horizon, $d$ is the dimensionality of the state-action features and $T$ is the number of episodes. This is the first statistically and computationally efficient algorithm for reinforcement learning with generalized linear functions.

Cite

Text

Wang et al. "Optimism in Reinforcement Learning with Generalized Linear Function Approximation." International Conference on Learning Representations, 2021.

Markdown

[Wang et al. "Optimism in Reinforcement Learning with Generalized Linear Function Approximation." International Conference on Learning Representations, 2021.](https://mlanthology.org/iclr/2021/wang2021iclr-optimism/)

BibTeX

@inproceedings{wang2021iclr-optimism,
  title     = {{Optimism in Reinforcement Learning with Generalized Linear Function Approximation}},
  author    = {Wang, Yining and Wang, Ruosong and Du, Simon Shaolei and Krishnamurthy, Akshay},
  booktitle = {International Conference on Learning Representations},
  year      = {2021},
  url       = {https://mlanthology.org/iclr/2021/wang2021iclr-optimism/}
}