Near-Optimal MNL Bandits Under Risk Criteria

Abstract

We study MNL bandits, which is a variant of the traditional multi-armed bandit problem, under risk criteria. Unlike the ordinary expected revenue, risk criteria are more general goals widely used in industries and business. We design algorithms for a broad class of risk criteria, including but not limited to the well-known conditional value-at-risk, Sharpe ratio, and entropy risk, and prove that they suffer a near-optimal regret. As a complement, we also conduct experiments with both synthetic and real data to show the empirical performance of our proposed algorithms.

Cite

Text

Xi et al. "Near-Optimal MNL Bandits Under Risk Criteria." AAAI Conference on Artificial Intelligence, 2021. doi:10.1609/AAAI.V35I12.17245

Markdown

[Xi et al. "Near-Optimal MNL Bandits Under Risk Criteria." AAAI Conference on Artificial Intelligence, 2021.](https://mlanthology.org/aaai/2021/xi2021aaai-near/) doi:10.1609/AAAI.V35I12.17245

BibTeX

@inproceedings{xi2021aaai-near,
  title     = {{Near-Optimal MNL Bandits Under Risk Criteria}},
  author    = {Xi, Guangyu and Tao, Chao and Zhou, Yuan},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2021},
  pages     = {10397-10404},
  doi       = {10.1609/AAAI.V35I12.17245},
  url       = {https://mlanthology.org/aaai/2021/xi2021aaai-near/}
}