Observe Before Play: Multi-Armed Bandit with Pre-Observations

Zuo, Jinhang; Zhang, Xiaoxi; Joe-Wong, Carlee

doi:10.1609/AAAI.V34I04.6187

Observe Before Play: Multi-Armed Bandit with Pre-Observations

Jinhang Zuo, Xiaoxi Zhang, Carlee Joe-Wong

AAAI 2020 pp. 7023-7030

doi:10.1609/AAAI.V34I04.6187 /aaai/2020/zuo2020aaai-observe/

Abstract

We consider the stochastic multi-armed bandit (MAB) problem in a setting where a player can pay to pre-observe arm rewards before playing an arm in each round. Apart from the usual trade-off between exploring new arms to find the best one and exploiting the arm believed to offer the highest reward, we encounter an additional dilemma: pre-observing more arms gives a higher chance to play the best one, but incurs a larger cost. For the single-player setting, we design an Observe-Before-Play Upper Confidence Bound (OBP-UCB) algorithm for K arms with Bernoulli rewards, and prove a T-round regret upper bound O(K2log T). In the multi-player setting, collisions will occur when players select the same arm to play in the same round. We design a centralized algorithm, C-MP-OBP, and prove its T-round regret relative to an offline greedy strategy is upper bounded in O(K4/M2log T) for K arms and M players. We also propose distributed versions of the C-MP-OBP policy, called D-MP-OBP and D-MP-Adapt-OBP, achieving logarithmic regret with respect to collision-free target policies. Experiments on synthetic data and wireless channel traces show that C-MP-OBP and D-MP-OBP outperform random heuristics and offline optimal policies that do not allow pre-observations.

PDF AAAI Semantic Scholar

Cite

Text

Zuo et al. "Observe Before Play: Multi-Armed Bandit with Pre-Observations." AAAI Conference on Artificial Intelligence, 2020. doi:10.1609/AAAI.V34I04.6187

Markdown

[Zuo et al. "Observe Before Play: Multi-Armed Bandit with Pre-Observations." AAAI Conference on Artificial Intelligence, 2020.](https://mlanthology.org/aaai/2020/zuo2020aaai-observe/) doi:10.1609/AAAI.V34I04.6187

BibTeX

@inproceedings{zuo2020aaai-observe,
  title     = {{Observe Before Play: Multi-Armed Bandit with Pre-Observations}},
  author    = {Zuo, Jinhang and Zhang, Xiaoxi and Joe-Wong, Carlee},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2020},
  pages     = {7023-7030},
  doi       = {10.1609/AAAI.V34I04.6187},
  url       = {https://mlanthology.org/aaai/2020/zuo2020aaai-observe/}
}