Bayesian Design Principles for Frequentist Sequential Learning
Abstract
We develop a general theory to optimize the frequentist regret for sequential learning problems, where efficient bandit and reinforcement learning algorithms can be derived from unified Bayesian principles. We propose a novel optimization approach to create "algorithmic beliefs" at each round, and use Bayesian posteriors to make decisions. This is the first approach to make Bayesian-type algorithms prior-free and applicable to adversarial settings, in a generic and optimal manner. Moreover, the algorithms are simple and often efficient to implement. As a major application, we present a novel algorithm for multi-armed bandits that achieves the "best-of-all-worlds" empirical performance in the stochastic, adversarial, and non-stationary environments. And we illustrate how these principles can be used in linear bandits, convex bandits, and reinforcement learning.
Cite
Text
Xu and Zeevi. "Bayesian Design Principles for Frequentist Sequential Learning." International Conference on Machine Learning, 2023.Markdown
[Xu and Zeevi. "Bayesian Design Principles for Frequentist Sequential Learning." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/xu2023icml-bayesian/)BibTeX
@inproceedings{xu2023icml-bayesian,
title = {{Bayesian Design Principles for Frequentist Sequential Learning}},
author = {Xu, Yunbei and Zeevi, Assaf},
booktitle = {International Conference on Machine Learning},
year = {2023},
pages = {38768-38800},
volume = {202},
url = {https://mlanthology.org/icml/2023/xu2023icml-bayesian/}
}