Non-Deterministic Behavior of Thompson Sampling with Linear Payoffs and How to Avoid It

Abstract

Thompson Sampling with Linear Payoffs (LinTS) is popular contextual bandit algorithm for solving sequential decision making problem. While LinTS has been studied extensively in the academic literature, surprisingly, its behavior in terms of reproducibility did not receive the same attention. In this paper, we show that a standard and seemingly correct LinTS implementation leads to non-deterministic behavior. This might go unnoticed easily, yet impact results adversely. This calls the reproducibility of papers that use LinTS into question. Further, it forbids using this particular implementation in any industrial application where reproducibility is critical not only for debugging purposes but also for the trustworthiness of machine learning models. We first study the root cause of the non-deterministic behavior. We then conduct experiments on recommendation system benchmarks to demonstrate the impact of non-deterministic behavior in terms of reproducibility and downstream metrics. Finally, as a remedy, we show how to avoid the issue to ensure reproducible results and share general advice for practitioners.

Cite

Text

Kilitcioglu and Kadioglu. "Non-Deterministic Behavior of Thompson Sampling with Linear Payoffs and How to Avoid It." Transactions on Machine Learning Research, 2022.

Markdown

[Kilitcioglu and Kadioglu. "Non-Deterministic Behavior of Thompson Sampling with Linear Payoffs and How to Avoid It." Transactions on Machine Learning Research, 2022.](https://mlanthology.org/tmlr/2022/kilitcioglu2022tmlr-nondeterministic/)

BibTeX

@article{kilitcioglu2022tmlr-nondeterministic,
  title     = {{Non-Deterministic Behavior of Thompson Sampling with Linear Payoffs and How to Avoid It}},
  author    = {Kilitcioglu, Doruk and Kadioglu, Serdar},
  journal   = {Transactions on Machine Learning Research},
  year      = {2022},
  url       = {https://mlanthology.org/tmlr/2022/kilitcioglu2022tmlr-nondeterministic/}
}