A Better Resource Allocation Algorithm with Semi-Bandit Feedback

Abstract

We study a sequential resource allocation problem between a fixed number of arms. On each iteration the algorithm distributes a resource among the arms in order to maximize the expected success rate. Allocating more of the resource to a given arm increases the probability that it succeeds, yet with a cut-off. We follow Lattimore et al (2014) and assume that the probability increases linearly until it equals one, after which allocating more of the resource is wasteful. These cut-off values are fixed and unknown to the learner. We present an algorithm for this problem and prove a regret upper bound of $O(\log n)$ improving over the best known bound of $O(\log^2 n)$. Lower bounds we prove show that our upper bound is tight. Simulations demonstrate the superiority of our algorithm.

Cite

Text

Dagan and Koby. "A Better Resource Allocation Algorithm with Semi-Bandit Feedback." Proceedings of Algorithmic Learning Theory, 2018.

Markdown

[Dagan and Koby. "A Better Resource Allocation Algorithm with Semi-Bandit Feedback." Proceedings of Algorithmic Learning Theory, 2018.](https://mlanthology.org/alt/2018/dagan2018alt-better/)

BibTeX

@inproceedings{dagan2018alt-better,
  title     = {{A Better Resource Allocation Algorithm with Semi-Bandit Feedback}},
  author    = {Dagan, Yuval and Koby, Crammer},
  booktitle = {Proceedings of Algorithmic Learning Theory},
  year      = {2018},
  pages     = {268-320},
  volume    = {83},
  url       = {https://mlanthology.org/alt/2018/dagan2018alt-better/}
}