Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models

Abstract

In this paper we consider the dynamic assortment selection problem under an uncapacitated multinomial-logit (MNL) model. By carefully analyzing a revenue potential function, we show that a trisection based algorithm achieves an item-independent regret bound of O(sqrt(T log log T), which matches information theoretical lower bounds up to iterated logarithmic terms. Our proof technique draws tools from the unimodal/convex bandit literature as well as adaptive confidence parameters in minimax multi-armed bandit problems.

Cite

Text

Wang et al. "Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models." Neural Information Processing Systems, 2018.

Markdown

[Wang et al. "Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models." Neural Information Processing Systems, 2018.](https://mlanthology.org/neurips/2018/wang2018neurips-nearoptimal/)

BibTeX

@inproceedings{wang2018neurips-nearoptimal,
  title     = {{Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models}},
  author    = {Wang, Yining and Chen, Xi and Zhou, Yuan},
  booktitle = {Neural Information Processing Systems},
  year      = {2018},
  pages     = {3101-3110},
  url       = {https://mlanthology.org/neurips/2018/wang2018neurips-nearoptimal/}
}