Exponential Family Model-Based Reinforcement Learning via Score Matching

Abstract

We propose a optimistic model-based algorithm, dubbed SMRL, for finite-horizon episodic reinforcement learning (RL) when the transition model is specified by exponential family distributions with $d$ parameters and the reward is bounded and known. SMRL uses score matching, an unnormalized density estimation technique that enables efficient estimation of the model parameter by ridge regression. SMRL achieves $\tilde O(d\sqrt{H^3T})$ regret, where $H$ is the length of each episode and $T$ is the total number of interactions.

Cite

Text

Li et al. "Exponential Family Model-Based Reinforcement Learning via Score Matching." NeurIPS 2021 Workshops: DeepRL, 2021.

Markdown

[Li et al. "Exponential Family Model-Based Reinforcement Learning via Score Matching." NeurIPS 2021 Workshops: DeepRL, 2021.](https://mlanthology.org/neuripsw/2021/li2021neuripsw-exponential/)

BibTeX

@inproceedings{li2021neuripsw-exponential,
  title     = {{Exponential Family Model-Based Reinforcement Learning via Score Matching}},
  author    = {Li, Gene and Li, Junbo and Srebro, Nathan and Wang, Zhaoran and Yang, Zhuoran},
  booktitle = {NeurIPS 2021 Workshops: DeepRL},
  year      = {2021},
  url       = {https://mlanthology.org/neuripsw/2021/li2021neuripsw-exponential/}
}