Adversarial Combinatorial Bandits with General Non-Linear Reward Functions

Abstract

In this paper we study the adversarial combinatorial bandit with a known non-linear reward function, extending existing work on adversarial linear combinatorial bandit. The adversarial combinatorial bandit with general non-linear reward is an important open problem in bandit literature, and it is still unclear whether there is a significant gap from the case of linear reward, stochastic bandit, or semi-bandit feedback. We show that, with $N$ arms and subsets of $K$ arms being chosen at each of $T$ time periods, the minimax optimal regret is $\widetilde\Theta_{d}(\sqrt{N^d T})$ if the reward function is a $d$-degree polynomial with $d< K$, and $\Theta_K(\sqrt{N^K T})$ if the reward function is not a low-degree polynomial. {Both bounds are significantly different from the bound $O(\sqrt{\mathrm{poly}(N,K)T})$ for the linear case, which suggests that there is a fundamental gap between the linear and non-linear reward structures.} Our result also finds applications to adversarial assortment optimization problem in online recommendation. We show that in the worst-case of adversarial assortment problem, the optimal algorithm must treat each individual $\binom{N}{K}$ assortment as independent.

Cite

Text

Han et al. "Adversarial Combinatorial Bandits with General Non-Linear Reward Functions." International Conference on Machine Learning, 2021.

Markdown

[Han et al. "Adversarial Combinatorial Bandits with General Non-Linear Reward Functions." International Conference on Machine Learning, 2021.](https://mlanthology.org/icml/2021/han2021icml-adversarial/)

BibTeX

@inproceedings{han2021icml-adversarial,
  title     = {{Adversarial Combinatorial Bandits with General Non-Linear Reward Functions}},
  author    = {Han, Yanjun and Wang, Yining and Chen, Xi},
  booktitle = {International Conference on Machine Learning},
  year      = {2021},
  pages     = {4030-4039},
  volume    = {139},
  url       = {https://mlanthology.org/icml/2021/han2021icml-adversarial/}
}