Fully Gap-Dependent Bounds for Multinomial Logit Bandit

Abstract

We study the multinomial logit (MNL) bandit problem, where at each time step, the seller offers an assortment of size at most $K$ from a pool of $N$ items, and the buyer purchases an item from the assortment according to a MNL choice model. The objective is to learn the model parameters and maximize the expected revenue. We present (i) an algorithm that identifies the optimal assortment $S^*$ within $\widetilde{O}(\sum_{i = 1}^N \Delta_i^{-2})$ time steps with high probability, and (ii) an algorithm that incurs $O(\sum_{i \notin S^*} K\Delta_i^{-1} \log T)$ regret in $T$ time steps. To our knowledge, our algorithms are the \emph{first} to achieve gap-dependent bounds that \emph{fully} depends on the suboptimality gaps of \emph{all} items. Our technical contributions include an algorithmic framework that relates the MNL-bandit problem to a variant of the top-$K$ arm identification problem in multi-armed bandits, a generalized epoch-based offering procedure, and a layer-based adaptive estimation procedure.

Cite

Text

Yang. "Fully Gap-Dependent Bounds for Multinomial Logit Bandit." Artificial Intelligence and Statistics, 2021.

Markdown

[Yang. "Fully Gap-Dependent Bounds for Multinomial Logit Bandit." Artificial Intelligence and Statistics, 2021.](https://mlanthology.org/aistats/2021/yang2021aistats-fully/)

BibTeX

@inproceedings{yang2021aistats-fully,
  title     = {{Fully Gap-Dependent Bounds for Multinomial Logit Bandit}},
  author    = {Yang, Jiaqi},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2021},
  pages     = {199-207},
  volume    = {130},
  url       = {https://mlanthology.org/aistats/2021/yang2021aistats-fully/}
}