Bounded Regret for Finite-Armed Structured Bandits
Abstract
We study a new type of K-armed bandit problem where the expected return of one arm may depend on the returns of other arms. We present a new algorithm for this general class of problems and show that under certain circumstances it is possible to achieve finite expected cumulative regret. We also give problem-dependent lower bounds on the cumulative regret showing that at least in special cases the new algorithm is nearly optimal.
Cite
Text
Lattimore and Munos. "Bounded Regret for Finite-Armed Structured Bandits." Neural Information Processing Systems, 2014.Markdown
[Lattimore and Munos. "Bounded Regret for Finite-Armed Structured Bandits." Neural Information Processing Systems, 2014.](https://mlanthology.org/neurips/2014/lattimore2014neurips-bounded/)BibTeX
@inproceedings{lattimore2014neurips-bounded,
title = {{Bounded Regret for Finite-Armed Structured Bandits}},
author = {Lattimore, Tor and Munos, Remi},
booktitle = {Neural Information Processing Systems},
year = {2014},
pages = {550-558},
url = {https://mlanthology.org/neurips/2014/lattimore2014neurips-bounded/}
}