Regret Bounds for Sleeping Experts and Bandits

Abstract

We study on-line decision problems where the set of actions that are available to the decision algorithm varies over time. With a few notable exceptions, such problems remained largely unaddressed in the literature, despite their applicability to a large number of practical problems. Departing from previous work on this “Sleeping Experts” problem, we compare algorithms against the payoff obtained by the best ordering of the actions, which is a natural benchmark for this type of problem. We study both the full-information (best expert) and partial-information (multi-armed bandit) settings and consider both stochastic and adversarial rewards models. For all settings we give algorithms achieving (almost) information-theoretically optimal regret bounds (up to a constant or a sub-logarithmic factor) with respect to the best-ordering benchmark.

Cite

Text

Kleinberg et al. "Regret Bounds for Sleeping Experts and Bandits." Machine Learning, 2010. doi:10.1007/S10994-010-5178-7

Markdown

[Kleinberg et al. "Regret Bounds for Sleeping Experts and Bandits." Machine Learning, 2010.](https://mlanthology.org/mlj/2010/kleinberg2010mlj-regret/) doi:10.1007/S10994-010-5178-7

BibTeX

@article{kleinberg2010mlj-regret,
  title     = {{Regret Bounds for Sleeping Experts and Bandits}},
  author    = {Kleinberg, Robert and Niculescu-Mizil, Alexandru and Sharma, Yogeshwer},
  journal   = {Machine Learning},
  year      = {2010},
  pages     = {245-272},
  doi       = {10.1007/S10994-010-5178-7},
  volume    = {80},
  url       = {https://mlanthology.org/mlj/2010/kleinberg2010mlj-regret/}
}