Following the Perturbed Leader to Gamble at Multi-Armed Bandits

Abstract

Following the perturbed leader ( fpl ) is a powerful technique for solving online decision problems. Kalai and Vempala [1] rediscovered this algorithm recently. A traditional model for online decision problems is the multi-armed bandit. In it a gambler has to choose at each round one of the k levers to pull with the intention to minimize the cumulated cost. There are four versions of the nonstochastic optimization setting out of which the most demanding one is a game played against an adaptive adversary in the bandit setting. An adaptive adversary may alter its game strategy of assigning costs to decisions depending on the decisions chosen by the gambler in the past. In the bandit setting the gambler only gets to know the cost of the choice he made, rather than the costs of all available alternatives. In this work we show that the very straightforward and easy to implement algorithm Adaptive Bandit fpl can attain a regret of $O(\sqrt{T \ln T})$ against an adaptive adversary. This regret holds with respect to the best lever in hindsight and matches the previous best regret bounds of $O(\sqrt{T \ln T})$ .

Cite

Text

Kujala and Elomaa. "Following the Perturbed Leader to Gamble at Multi-Armed Bandits." International Conference on Algorithmic Learning Theory, 2007. doi:10.1007/978-3-540-75225-7_16

Markdown

[Kujala and Elomaa. "Following the Perturbed Leader to Gamble at Multi-Armed Bandits." International Conference on Algorithmic Learning Theory, 2007.](https://mlanthology.org/alt/2007/kujala2007alt-following/) doi:10.1007/978-3-540-75225-7_16

BibTeX

@inproceedings{kujala2007alt-following,
  title     = {{Following the Perturbed Leader to Gamble at Multi-Armed Bandits}},
  author    = {Kujala, Jussi and Elomaa, Tapio},
  booktitle = {International Conference on Algorithmic Learning Theory},
  year      = {2007},
  pages     = {166-180},
  doi       = {10.1007/978-3-540-75225-7_16},
  url       = {https://mlanthology.org/alt/2007/kujala2007alt-following/}
}