Improved Algorithms for Adversarial Bandits with Unbounded Losses

Abstract

We consider the Adversarial Multi-Armed Bandits (MAB) problem with unbounded losses, where the algorithms have no prior knowledge on the sizes of the losses. We present \texttt{UMAB-NN} and \texttt{UMAB-G}, two algorithms for non-negative and general unbounded loss respectively. For non-negative unbounded losses, \texttt{UMAB-NN} achieves the first fully adaptive and scale-free regret bound. Building up on that, we further develop \texttt{UMAB-G} that can learn from arbitrary unbounded losses. Complementing the algorithms is a new lower bound, demonstrating a fundamental trade-off between adaptivility and minimax optimality in scale-free MAB. Finally, we perform extensive empirical evaluations, showing that our algorithms consistently out-perform all existing algorithms that handle unbounded losses.

Cite

Text

Chen and Zhang. "Improved Algorithms for Adversarial Bandits with Unbounded Losses." ICML 2024 Workshops: ARLET, 2024.

Markdown

[Chen and Zhang. "Improved Algorithms for Adversarial Bandits with Unbounded Losses." ICML 2024 Workshops: ARLET, 2024.](https://mlanthology.org/icmlw/2024/chen2024icmlw-improved/)

BibTeX

@inproceedings{chen2024icmlw-improved,
  title     = {{Improved Algorithms for Adversarial Bandits with Unbounded Losses}},
  author    = {Chen, Mingyu and Zhang, Xuezhou},
  booktitle = {ICML 2024 Workshops: ARLET},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/chen2024icmlw-improved/}
}