Combinatorial Bandits for Maximum Value Reward Function Under Value-Index Feedback

ICLR 2024

/iclr/2024/wang2024iclr-combinatorial/

Abstract

We investigate the combinatorial multi-armed bandit problem where an action is to select $k$ arms from a set of base arms, and its reward is the maximum of the sample values of these $k$ arms, under a weak feedback structure that only returns the value and index of the arm with the maximum value. This novel feedback structure is much weaker than the semi-bandit feedback previously studied and is only slightly stronger than the full-bandit feedback, and thus it presents a new challenge for the online learning task. We propose an algorithm and derive a regret bound for instances where arm outcomes follow distributions with finite supports. Our algorithm introduces a novel concept of biased arm replacement to address the weak feedback challenge, and it achieves a distribution-dependent regret bound of $O((k/\Delta)\log(T))$ and a distribution-independent regret bound of $\tilde{O}(\sqrt{T})$, where $\Delta$ is the reward gap and $T$ is the time horizon. Notably, our regret bound is comparable to the bounds obtained under the more informative semi-bandit feedback. We demonstrate the effectiveness of our algorithm through experimental results.

PDF ICLR Semantic Scholar

Cite

Text

Wang et al. "Combinatorial Bandits for Maximum Value Reward Function Under Value-Index Feedback." International Conference on Learning Representations, 2024.

Markdown

[Wang et al. "Combinatorial Bandits for Maximum Value Reward Function Under Value-Index Feedback." International Conference on Learning Representations, 2024.](https://mlanthology.org/iclr/2024/wang2024iclr-combinatorial/)

BibTeX

@inproceedings{wang2024iclr-combinatorial,
  title     = {{Combinatorial Bandits for Maximum Value Reward Function Under Value-Index Feedback}},
  author    = {Wang, Yiliu and Chen, Wei and Vojnovic, Milan},
  booktitle = {International Conference on Learning Representations},
  year      = {2024},
  url       = {https://mlanthology.org/iclr/2024/wang2024iclr-combinatorial/}
}