Minimax Games with Bandits
Abstract
We study minimax optimal strategies for a repeated game where a learner picks a distribution over $n$ actions and an adversary reveals a loss vector, but the learner only observes the loss of the chosen action (bandit feedback). Extending prior minimax analysis of the hedge setting to the bandit case, we pose open problems about the structure and efficient computation of the minimax-optimal strategy.
Cite
Text
Abernethy and Warmuth. "Minimax Games with Bandits." Annual Conference on Computational Learning Theory, 2009.Markdown
[Abernethy and Warmuth. "Minimax Games with Bandits." Annual Conference on Computational Learning Theory, 2009.](https://mlanthology.org/colt/2009/abernethy2009colt-minimax/)BibTeX
@inproceedings{abernethy2009colt-minimax,
title = {{Minimax Games with Bandits}},
author = {Abernethy, Jacob D. and Warmuth, Manfred K.},
booktitle = {Annual Conference on Computational Learning Theory},
year = {2009},
url = {https://mlanthology.org/colt/2009/abernethy2009colt-minimax/}
}