Scalar Posterior Sampling with Applications

Abstract

We propose a practical non-episodic PSRL algorithm that unlike recent state-of-the-art PSRL algorithms uses a deterministic, model-independent episode switching schedule. Our algorithm termed deterministic schedule PSRL (DS-PSRL) is efficient in terms of time, sample, and space complexity. We prove a Bayesian regret bound under mild assumptions. Our result is more generally applicable to multiple parameters and continuous state action problems. We compare our algorithm with state-of-the-art PSRL algorithms on standard discrete and continuous problems from the literature. Finally, we show how the assumptions of our algorithm satisfy a sensible parameterization for a large class of problems in sequential recommendations.

Cite

Text

Theocharous et al. "Scalar Posterior Sampling with Applications." Neural Information Processing Systems, 2018.

Markdown

[Theocharous et al. "Scalar Posterior Sampling with Applications." Neural Information Processing Systems, 2018.](https://mlanthology.org/neurips/2018/theocharous2018neurips-scalar/)

BibTeX

@inproceedings{theocharous2018neurips-scalar,
  title     = {{Scalar Posterior Sampling with Applications}},
  author    = {Theocharous, Georgios and Wen, Zheng and Yadkori, Yasin Abbasi and Vlassis, Nikos},
  booktitle = {Neural Information Processing Systems},
  year      = {2018},
  pages     = {7685-7693},
  url       = {https://mlanthology.org/neurips/2018/theocharous2018neurips-scalar/}
}