Nearly Minimax Optimal Regret for Multinomial Logistic Bandit
Abstract
In this paper, we study the contextual multinomial logit (MNL) bandit problem in which a learning agent sequentially selects an assortment based on contextual information, and user feedback follows an MNL choice model.There has been a significant discrepancy between lower and upper regret bounds, particularly regarding the maximum assortment size $K$. Additionally, the variation in reward structures between these bounds complicates the quest for optimality. Under uniform rewards, where all items have the same expected reward, we establish a regret lower bound of $\Omega(d\sqrt{\smash[b]{T/K}})$ and propose a constant-time algorithm, OFU-MNL+, that achieves a matching upper bound of $\tilde{\mathcal{O}}(d\sqrt{\smash[b]{T/K}})$. We also provide instance-dependent minimax regret bounds under uniform rewards.Under non-uniform rewards, we prove a lower bound of $\Omega(d\sqrt{T})$ and an upper bound of $\tilde{\mathcal{O}}(d\sqrt{T})$, also achievable by OFU-MNL+. Our empirical studies support these theoretical findings. To the best of our knowledge, this is the first work in the contextual MNL bandit literature to prove minimax optimality --- for either uniform or non-uniform reward setting --- and to propose a computationally efficient algorithm that achieves this optimality up to logarithmic factors.
Cite
Text
Lee and Oh. "Nearly Minimax Optimal Regret for Multinomial Logistic Bandit." Neural Information Processing Systems, 2024. doi:10.52202/079017-3461Markdown
[Lee and Oh. "Nearly Minimax Optimal Regret for Multinomial Logistic Bandit." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/lee2024neurips-nearly/) doi:10.52202/079017-3461BibTeX
@inproceedings{lee2024neurips-nearly,
title = {{Nearly Minimax Optimal Regret for Multinomial Logistic Bandit}},
author = {Lee, Joongkyu and Oh, Min-hwan},
booktitle = {Neural Information Processing Systems},
year = {2024},
doi = {10.52202/079017-3461},
url = {https://mlanthology.org/neurips/2024/lee2024neurips-nearly/}
}