Best of Both Worlds: Regret Minimization Versus Minimax Play

Abstract

In this paper, we investigate the existence of online learning algorithms with bandit feedback that simultaneously guarantee $O(1)$ regret compared to a given comparator strategy, and $\tilde{O}(\sqrt{T})$ regret compared to any fixed strategy, where $T$ is the number of rounds. We provide the first affirmative answer to this question whenever the comparator strategy supports every action. In the context of zero-sum games with min-max value zero, both in normal- and extensive form, we show that our results allow us to guarantee to risk at most $O(1)$ loss while being able to gain $\Omega(T)$ from exploitable opponents, thereby combining the benefits of both no-regret algorithms and minimax play.

Cite

Text

Müller et al. "Best of Both Worlds: Regret Minimization Versus Minimax Play." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Müller et al. "Best of Both Worlds: Regret Minimization Versus Minimax Play." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/muller2025icml-best/)

BibTeX

@inproceedings{muller2025icml-best,
  title     = {{Best of Both Worlds: Regret Minimization Versus Minimax Play}},
  author    = {Müller, Adrian and Schneider, Jon and Skoulakis, Stratis and Viano, Luca and Cevher, Volkan},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {45185-45211},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/muller2025icml-best/}
}