OSOM: A Simultaneously Optimal Algorithm for Multi-Armed and Linear Contextual Bandits

Abstract

We consider the stochastic linear (multi-armed) contextual bandit problem with the possibility of hidden simple multi-armed bandit structure in which the rewards are independent of the contextual information. Algorithms that are designed solely for one of the regimes are known to be sub-optimal for their alternate regime. We design a single computationally efficient algorithm that simultaneously obtains problem-dependent optimal regret rates in the simple multi-armed bandit regime and minimax optimal regret rates in the linear contextual bandit regime, without knowing a priori which of the two models generates the rewards. These results are proved under the condition of stochasticity of contextual information over multiple rounds. Our results should be viewed as a step towards principled data-dependent policy class selection for contextual bandits.

Cite

Text

Chatterji et al. "OSOM: A Simultaneously Optimal Algorithm for Multi-Armed and Linear Contextual Bandits." Artificial Intelligence and Statistics, 2020.

Markdown

[Chatterji et al. "OSOM: A Simultaneously Optimal Algorithm for Multi-Armed and Linear Contextual Bandits." Artificial Intelligence and Statistics, 2020.](https://mlanthology.org/aistats/2020/chatterji2020aistats-osom/)

BibTeX

@inproceedings{chatterji2020aistats-osom,
  title     = {{OSOM: A Simultaneously Optimal Algorithm for Multi-Armed and Linear Contextual Bandits}},
  author    = {Chatterji, Niladri and Muthukumar, Vidya and Bartlett, Peter},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2020},
  pages     = {1844-1854},
  volume    = {108},
  url       = {https://mlanthology.org/aistats/2020/chatterji2020aistats-osom/}
}