Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions

Abstract

We study the stochastic Multi-Armed Bandit (MAB) problem with random delays in the feedback received by the algorithm. We consider two settings: the {\it reward dependent} delay setting, where realized delays may depend on the stochastic rewards, and the {\it reward-independent} delay setting. Our main contribution is algorithms that achieve near-optimal regret in each of the settings, with an additional additive dependence on the quantiles of the delay distribution. Our results do not make any assumptions on the delay distributions: in particular, we do not assume they come from any parametric family of distributions and allow for unbounded support and expectation; we further allow for the case of infinite delays where the algorithm might occasionally not observe any feedback.

Cite

Text

Lancewicki et al. "Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions." International Conference on Machine Learning, 2021.

Markdown

[Lancewicki et al. "Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions." International Conference on Machine Learning, 2021.](https://mlanthology.org/icml/2021/lancewicki2021icml-stochastic/)

BibTeX

@inproceedings{lancewicki2021icml-stochastic,
  title     = {{Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions}},
  author    = {Lancewicki, Tal and Segal, Shahar and Koren, Tomer and Mansour, Yishay},
  booktitle = {International Conference on Machine Learning},
  year      = {2021},
  pages     = {5969-5978},
  volume    = {139},
  url       = {https://mlanthology.org/icml/2021/lancewicki2021icml-stochastic/}
}