Cascading Contextual Assortment Bandits

Abstract

We present a new combinatorial bandit model, the \textit{cascading contextual assortment bandit}. This model serves as a generalization of both existing cascading bandits and assortment bandits, broadening their applicability in practice. For this model, we propose our first UCB bandit algorithm, UCB-CCA. We prove that this algorithm achieves a $T$-step regret upper-bound of $\tilde{\mathcal{O}}(\frac{1}{\kappa}d\sqrt{T})$, sharper than existing bounds for cascading contextual bandits by eliminating dependence on cascade length $K$. To improve the dependence on problem-dependent constant $\kappa$, we introduce our second algorithm, UCB-CCA+, which leverages a new Bernstein-type concentration result. This algorithm achieves $\tilde{\mathcal{O}}(d\sqrt{T})$ without dependence on $\kappa$ in the leading term. We substantiate our theoretical claims with numerical experiments, demonstrating the practical efficacy of our proposed methods.

Cite

Text

Choi et al. "Cascading Contextual Assortment Bandits." Neural Information Processing Systems, 2023.

Markdown

[Choi et al. "Cascading Contextual Assortment Bandits." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/choi2023neurips-cascading/)

BibTeX

@inproceedings{choi2023neurips-cascading,
  title     = {{Cascading Contextual Assortment Bandits}},
  author    = {Choi, Hyun-jun and Udwani, Rajan and Oh, Min-hwan},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/choi2023neurips-cascading/}
}