Budget-Constrained Bandits over General Cost and Reward Distributions

Abstract

We consider a budget-constrained bandit problem where each arm pull incurs a random cost, and yields a random reward in return. The objective is to maximize the total expected reward under a budget constraint on the total cost. The model is general in the sense that it allows correlated and potentially heavy-tailed cost-reward pairs that can take on negative values as required by many applications. We show that if moments of order $(2+\gamma)$ for some $\gamma > 0$ exist for all cost-reward pairs, $O(\log B)$ regret is achievable for a budget $B>0$. In order to achieve tight regret bounds, we propose algorithms that exploit the correlation between the cost and reward of each arm by extracting the common information via linear minimum mean-square error estimation. We prove a regret lower bound for this problem, and show that the proposed algorithms achieve tight problem-dependent regret bounds, which are optimal up to a universal constant factor in the case of jointly Gaussian cost and reward pairs.

Cite

Text

Cayci et al. "Budget-Constrained Bandits over General Cost and Reward Distributions." Artificial Intelligence and Statistics, 2020.

Markdown

[Cayci et al. "Budget-Constrained Bandits over General Cost and Reward Distributions." Artificial Intelligence and Statistics, 2020.](https://mlanthology.org/aistats/2020/cayci2020aistats-budgetconstrained/)

BibTeX

@inproceedings{cayci2020aistats-budgetconstrained,
  title     = {{Budget-Constrained Bandits over General Cost and Reward Distributions}},
  author    = {Cayci, Semih and Eryilmaz, Atilla and Srikant, R},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2020},
  pages     = {4388-4398},
  volume    = {108},
  url       = {https://mlanthology.org/aistats/2020/cayci2020aistats-budgetconstrained/}
}