Near-Optimal Pure Exploration in Matrix Games: A Generalization of Stochastic Bandits & Dueling Bandits

Abstract

We study the sample complexity of identifying the pure strategy Nash equilibrium (PSNE) in a two-player zero-sum matrix game with noise. Formally, we are given a stochastic model where any learner can sample an entry $(i,j)$ of the input matrix $A\in [-1,1]^{n\times m}$ and observe $A_{i,j}+\eta$ where $\eta$ is a zero-mean $1$-sub-Gaussian noise. The aim of the learner is to identify the PSNE of $A$, whenever it exists, with high probability while taking as few samples as possible. Zhou et al., (2017) presents an instance-dependent sample complexity lower bound that depends only on the entries in the row and column in which the PSNE lies. We design a near-optimal algorithm whose sample complexity matches the lower bound, up to log factors. The problem of identifying the PSNE also generalizes the problem of pure exploration in stochastic multi-armed bandits and dueling bandits, and our result matches the optimal bounds, up to log factors, in both the settings.

Cite

Text

Maiti et al. "Near-Optimal Pure Exploration in Matrix Games: A Generalization of Stochastic Bandits & Dueling Bandits." Artificial Intelligence and Statistics, 2024.

Markdown

[Maiti et al. "Near-Optimal Pure Exploration in Matrix Games: A Generalization of Stochastic Bandits & Dueling Bandits." Artificial Intelligence and Statistics, 2024.](https://mlanthology.org/aistats/2024/maiti2024aistats-nearoptimal/)

BibTeX

@inproceedings{maiti2024aistats-nearoptimal,
  title     = {{Near-Optimal Pure Exploration in Matrix Games: A Generalization of Stochastic Bandits & Dueling Bandits}},
  author    = {Maiti, Arnab and Boczar, Ross and Jamieson, Kevin and Ratliff, Lillian},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2024},
  pages     = {2602-2610},
  volume    = {238},
  url       = {https://mlanthology.org/aistats/2024/maiti2024aistats-nearoptimal/}
}