Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models
Abstract
In this paper we consider the dynamic assortment selection problem under an uncapacitated multinomial-logit (MNL) model. By carefully analyzing a revenue potential function, we show that a trisection based algorithm achieves an item-independent regret bound of O(sqrt(T log log T), which matches information theoretical lower bounds up to iterated logarithmic terms. Our proof technique draws tools from the unimodal/convex bandit literature as well as adaptive confidence parameters in minimax multi-armed bandit problems.
Cite
Text
Wang et al. "Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models." Neural Information Processing Systems, 2018.Markdown
[Wang et al. "Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models." Neural Information Processing Systems, 2018.](https://mlanthology.org/neurips/2018/wang2018neurips-nearoptimal/)BibTeX
@inproceedings{wang2018neurips-nearoptimal,
title = {{Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models}},
author = {Wang, Yining and Chen, Xi and Zhou, Yuan},
booktitle = {Neural Information Processing Systems},
year = {2018},
pages = {3101-3110},
url = {https://mlanthology.org/neurips/2018/wang2018neurips-nearoptimal/}
}