Regret Minimization via Saddle Point Optimization

Abstract

A long line of works characterizes the sample complexity of regret minimization in sequential decision-making by min-max programs. In the corresponding saddle-point game, the min-player optimizes the sampling distribution against an adversarial max-player that chooses confusing models leading to large regret. The most recent instantiation of this idea is the decision-estimation coefficient (DEC), which was shown to provide nearly tight lower and upper bounds on the worst-case expected regret in structured bandits and reinforcement learning. By re-parametrizing the offset DEC with the confidence radius and solving the corresponding min-max program, we derive an anytime variant of the Estimation-To-Decisions algorithm (Anytime-E2D). Importantly, the algorithm optimizes the exploration-exploitation trade-off online instead of via the analysis. Our formulation leads to a practical algorithm for finite model classes and linear feedback models. We further point out connections to the information ratio, decoupling coefficient and PAC-DEC, and numerically evaluate the performance of E2D on simple examples.

Cite

Text

Kirschner et al. "Regret Minimization via Saddle Point Optimization." Neural Information Processing Systems, 2023.

Markdown

[Kirschner et al. "Regret Minimization via Saddle Point Optimization." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/kirschner2023neurips-regret/)

BibTeX

@inproceedings{kirschner2023neurips-regret,
  title     = {{Regret Minimization via Saddle Point Optimization}},
  author    = {Kirschner, Johannes and Bakhtiari, Alireza and Chandak, Kushagra and Tkachuk, Volodymyr and Szepesvari, Csaba},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/kirschner2023neurips-regret/}
}