TS-UCB: Improving on Thompson Sampling with Little to No Additional Computation

Abstract

Thompson sampling has become a ubiquitous approach to online decision problems with bandit feedback. The key algorithmic task for Thompson sampling is drawing a sample from the posterior of the optimal action. We propose an alternative arm selection rule we dub TS-UCB, that requires negligible additional computational effort but provides significant performance improvements relative to Thompson sampling. At each step, TS-UCB computes a score for each arm using two ingredients: posterior sample(s) and upper confidence bounds. TS-UCB can be used in any setting where these two quantities are available, and it is flexible in the number of posterior samples it takes as input. TS-UCB achieves materially lower regret on a comprehensive suite of synthetic and real-world datasets, including a personalized article recommendation dataset from Yahoo! and a suite of benchmark datasets from a deep bandit suite proposed in Riquelme et al. (2018). Finally, from a theoretical perspective, we establish optimal regret guarantees for TS-UCB for both the K-armed and linear bandit models.

Cite

Text

Baek and Farias. "TS-UCB: Improving on Thompson Sampling with Little to No Additional Computation." Artificial Intelligence and Statistics, 2023.

Markdown

[Baek and Farias. "TS-UCB: Improving on Thompson Sampling with Little to No Additional Computation." Artificial Intelligence and Statistics, 2023.](https://mlanthology.org/aistats/2023/baek2023aistats-tsucb/)

BibTeX

@inproceedings{baek2023aistats-tsucb,
  title     = {{TS-UCB: Improving on Thompson Sampling with Little to No Additional Computation}},
  author    = {Baek, Jackie and Farias, Vivek},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2023},
  pages     = {11132-11148},
  volume    = {206},
  url       = {https://mlanthology.org/aistats/2023/baek2023aistats-tsucb/}
}