Strategizing Against No-Regret Learners

Abstract

How should a player who repeatedly plays a game against a no-regret learner strategize to maximize his utility? We study this question and show that under some mild assumptions, the player can always guarantee himself a utility of at least what he would get in a Stackelberg equilibrium. When the no-regret learner has only two actions, we show that the player cannot get any higher utility than the Stackelberg equilibrium utility. But when the no-regret learner has more than two actions and plays a mean-based no-regret strategy, we show that the player can get strictly higher than the Stackelberg equilibrium utility. We construct the optimal game-play for the player against a mean-based no-regret learner who has three actions. When the no-regret learner's strategy also guarantees him a no-swap regret, we show that the player cannot get anything higher than a Stackelberg equilibrium utility.

Cite

Text

Deng et al. "Strategizing Against No-Regret Learners." Neural Information Processing Systems, 2019.

Markdown

[Deng et al. "Strategizing Against No-Regret Learners." Neural Information Processing Systems, 2019.](https://mlanthology.org/neurips/2019/deng2019neurips-strategizing/)

BibTeX

@inproceedings{deng2019neurips-strategizing,
  title     = {{Strategizing Against No-Regret Learners}},
  author    = {Deng, Yuan and Schneider, Jon and Sivan, Balasubramanian},
  booktitle = {Neural Information Processing Systems},
  year      = {2019},
  pages     = {1579-1587},
  url       = {https://mlanthology.org/neurips/2019/deng2019neurips-strategizing/}
}