An Asymptotically Optimal Algorithm for the Max K-Armed Bandit Problem

Abstract

We present an asymptotically optimal algorithm for the max variant of the k-armed bandit problem. Given a set of k slot machines, each yielding payoff from a fixed (but unknown) distribution, we wish to allocate trials to the machines so as to maximize the expected maximum payoff � received over a series of n trials. Subject to certain distributional assumptions, we show that O ln ( 1 ln(n)2) δ ɛ2 � trials are sufficient to identify, with probability at least 1 − δ, a machine whose expected maximum payoff is within ɛ of optimal. This result leads to a strategy for solving the problem that is asymptotically optimal in the following sense: the gap between the expected maximum payoff obtained by using our strategy for n trials and that obtained by pulling the single best arm for all n trials approaches zero as n → ∞.

Cite

Text

Streeter and Smith. "An Asymptotically Optimal Algorithm for the Max K-Armed Bandit Problem." AAAI Conference on Artificial Intelligence, 2006.

Markdown

[Streeter and Smith. "An Asymptotically Optimal Algorithm for the Max K-Armed Bandit Problem." AAAI Conference on Artificial Intelligence, 2006.](https://mlanthology.org/aaai/2006/streeter2006aaai-asymptotically/)

BibTeX

@inproceedings{streeter2006aaai-asymptotically,
  title     = {{An Asymptotically Optimal Algorithm for the Max K-Armed Bandit Problem}},
  author    = {Streeter, Matthew J. and Smith, Stephen F.},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2006},
  pages     = {135-142},
  url       = {https://mlanthology.org/aaai/2006/streeter2006aaai-asymptotically/}
}