Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits
Abstract
I analyse the frequentist regret of the famous Gittins index strategy for multi-armed bandits with Gaussian noise and a finite horizon. Remarkably it turns out that this approach leads to finite-time regret guarantees comparable to those available for the popular UCB algorithm. Along the way I derive finite-time bounds on the Gittins index that are asymptotically exact and may be of independent interest. I also discuss some computational issues and present experimental results suggesting that a particular version of the Gittins index strategy is a modest improvement on existing algorithms with finite-time regret guarantees such as UCB and Thompson sampling.
Cite
Text
Lattimore. "Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits." Annual Conference on Computational Learning Theory, 2016.Markdown
[Lattimore. "Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits." Annual Conference on Computational Learning Theory, 2016.](https://mlanthology.org/colt/2016/lattimore2016colt-regret/)BibTeX
@inproceedings{lattimore2016colt-regret,
title = {{Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits}},
author = {Lattimore, Tor},
booktitle = {Annual Conference on Computational Learning Theory},
year = {2016},
pages = {1214-1245},
url = {https://mlanthology.org/colt/2016/lattimore2016colt-regret/}
}