Contextual Bandits with Similarity Information

Abstract

In a multi-armed bandit (MAB) problem, an online algorithm makes a sequence of choices. In each round it chooses from a time- invariant set of alternatives and receives the payoff associated with this alternative. While the case of small strategy sets is by now well-understood, a lot of recent work has focused on MAB problems with exponentially or infinitely large strategy sets, where one needs to assume extra structure in order to make the problem tractable. In particular, recent literature considered information on similarity between arms.

Cite

Text

Slivkins. "Contextual Bandits with Similarity Information." Journal of Machine Learning Research, 2014.

Markdown

[Slivkins. "Contextual Bandits with Similarity Information." Journal of Machine Learning Research, 2014.](https://mlanthology.org/jmlr/2014/slivkins2014jmlr-contextual/)

BibTeX

@article{slivkins2014jmlr-contextual,
  title     = {{Contextual Bandits with Similarity Information}},
  author    = {Slivkins, Aleksandrs},
  journal   = {Journal of Machine Learning Research},
  year      = {2014},
  pages     = {2533-2568},
  volume    = {15},
  url       = {https://mlanthology.org/jmlr/2014/slivkins2014jmlr-contextual/}
}