Minimax Games with Bandits

Abernethy, Jacob D.; Warmuth, Manfred K.

Minimax Games with Bandits

COLT 2009

/colt/2009/abernethy2009colt-minimax/

Abstract

We study minimax optimal strategies for a repeated game where a learner picks a distribution over $n$ actions and an adversary reveals a loss vector, but the learner only observes the loss of the chosen action (bandit feedback). Extending prior minimax analysis of the hedge setting to the bandit case, we pose open problems about the structure and efficient computation of the minimax-optimal strategy.

PDF Semantic Scholar

Cite

Text

Abernethy and Warmuth. "Minimax Games with Bandits." Annual Conference on Computational Learning Theory, 2009.

Markdown

[Abernethy and Warmuth. "Minimax Games with Bandits." Annual Conference on Computational Learning Theory, 2009.](https://mlanthology.org/colt/2009/abernethy2009colt-minimax/)

BibTeX

@inproceedings{abernethy2009colt-minimax,
  title     = {{Minimax Games with Bandits}},
  author    = {Abernethy, Jacob D. and Warmuth, Manfred K.},
  booktitle = {Annual Conference on Computational Learning Theory},
  year      = {2009},
  url       = {https://mlanthology.org/colt/2009/abernethy2009colt-minimax/}
}