Mortal Multi-Armed Bandits
Abstract
We formulate and study a new variant of the $k$-armed bandit problem, motivated by e-commerce applications. In our model, arms have (stochastic) lifetime after which they expire. In this setting an algorithm needs to continuously explore new arms, in contrast to the standard $k$-armed bandit model in which arms are available indefinitely and exploration is reduced once an optimal arm is identified with near-certainty. The main motivation for our setting is online-advertising, where ads have limited lifetime due to, for example, the nature of their content and their campaign budget. An algorithm needs to choose among a large collection of ads, more than can be fully explored within the ads' lifetime. We present an optimal algorithm for the state-aware (deterministic reward function) case, and build on this technique to obtain an algorithm for the state-oblivious (stochastic reward function) case. Empirical studies on various reward distributions, including one derived from a real-world ad serving application, show that the proposed algorithms significantly outperform the standard multi-armed bandit approaches applied to these settings.
Cite
Text
Chakrabarti et al. "Mortal Multi-Armed Bandits." Neural Information Processing Systems, 2008.Markdown
[Chakrabarti et al. "Mortal Multi-Armed Bandits." Neural Information Processing Systems, 2008.](https://mlanthology.org/neurips/2008/chakrabarti2008neurips-mortal/)BibTeX
@inproceedings{chakrabarti2008neurips-mortal,
title = {{Mortal Multi-Armed Bandits}},
author = {Chakrabarti, Deepayan and Kumar, Ravi and Radlinski, Filip and Upfal, Eli},
booktitle = {Neural Information Processing Systems},
year = {2008},
pages = {273-280},
url = {https://mlanthology.org/neurips/2008/chakrabarti2008neurips-mortal/}
}