An Asymptotically Optimal Algorithm for the Max K-Armed Bandit Problem
Abstract
We present an asymptotically optimal algorithm for the max variant of the k-armed bandit problem. Given a set of k slot machines, each yielding payoff from a fixed (but unknown) distribution, we wish to allocate trials to the machines so as to maximize the expected maximum payoff � received over a series of n trials. Subject to certain distributional assumptions, we show that O ln ( 1 ln(n)2) δ ɛ2 � trials are sufficient to identify, with probability at least 1 − δ, a machine whose expected maximum payoff is within ɛ of optimal. This result leads to a strategy for solving the problem that is asymptotically optimal in the following sense: the gap between the expected maximum payoff obtained by using our strategy for n trials and that obtained by pulling the single best arm for all n trials approaches zero as n → ∞.
Cite
Text
Streeter and Smith. "An Asymptotically Optimal Algorithm for the Max K-Armed Bandit Problem." AAAI Conference on Artificial Intelligence, 2006.Markdown
[Streeter and Smith. "An Asymptotically Optimal Algorithm for the Max K-Armed Bandit Problem." AAAI Conference on Artificial Intelligence, 2006.](https://mlanthology.org/aaai/2006/streeter2006aaai-asymptotically/)BibTeX
@inproceedings{streeter2006aaai-asymptotically,
title = {{An Asymptotically Optimal Algorithm for the Max K-Armed Bandit Problem}},
author = {Streeter, Matthew J. and Smith, Stephen F.},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2006},
pages = {135-142},
url = {https://mlanthology.org/aaai/2006/streeter2006aaai-asymptotically/}
}