Stochastic Top-$k$ Subset Bandits with Linear Space and Non-Linear Feedback

Abstract

Many real-world problems like Social Influence Maximization face the dilemma of choosing the best $K$ out of $N$ options at a given time instant. This setup can be modeled as a combinatorial bandit which chooses $K$ out of $N$ arms at each time, with an aim to achieve an efficient trade-off between exploration and exploitation. This is the first work for combinatorial bandits where the feedback received can be a non-linear function of the chosen $K$ arms. The direct use of multi-armed bandit requires choosing among $N$-choose-$K$ options making the state space large. In this paper, we present a novel algorithm which is computationally efficient and the storage is linear in $N$. The proposed algorithm is a divide-and-conquer based strategy, that we call CMAB-SM. Further, the proposed algorithm achieves a \textit{regret bound} of $\tilde O(K^{\frac{1}{2}}N^{\frac{1}{3}}T^{\frac{2}{3}})$ for a time horizon $T$, which is \textit{sub-linear} in all parameters $T$, $N$, and $K$.

Cite

Text

Agarwal et al. "Stochastic Top-$k$ Subset Bandits with Linear Space and Non-Linear Feedback." Proceedings of the 32nd International Conference on Algorithmic Learning Theory, 2021.

Markdown

[Agarwal et al. "Stochastic Top-$k$ Subset Bandits with Linear Space and Non-Linear Feedback." Proceedings of the 32nd International Conference on Algorithmic Learning Theory, 2021.](https://mlanthology.org/alt/2021/agarwal2021alt-stochastic/)

BibTeX

@inproceedings{agarwal2021alt-stochastic,
  title     = {{Stochastic Top-$k$ Subset Bandits with Linear Space and Non-Linear Feedback}},
  author    = {Agarwal, Mridul and Aggarwal, Vaneet and Quinn, Christopher J. and Umrawal, Abhishek K.},
  booktitle = {Proceedings of the 32nd International Conference on Algorithmic Learning Theory},
  year      = {2021},
  pages     = {306-339},
  volume    = {132},
  url       = {https://mlanthology.org/alt/2021/agarwal2021alt-stochastic/}
}