First- and Second-Order Bounds for Adversarial Linear Contextual Bandits

Abstract

We consider the adversarial linear contextual bandit setting, whichallows for the loss functions associated with each of $K$ arms to changeover time without restriction. Assuming the $d$-dimensional contexts aredrawn from a fixed known distribution, the worst-case expected regretover the course of $T$ rounds is known to scale as $\tilde O(\sqrt{KdT})$. Under the additional assumption that the density of the contextsis log-concave, we obtain a second-order bound of order $\tildeO(K\sqrt{d V_T})$ in terms of the cumulative second moment of thelearner's losses $V_T$, and a closely related first-order bound of order$\tilde O(K\sqrt{d L_T^*})$ in terms of the cumulative loss of the bestpolicy $L_T^*$. Since $V_T$ or $L_T^*$ may be significantly smaller than$T$, these improve over the worst-case regret whenever the environmentis relatively benign. Our results are obtained using a truncated versionof the continuous exponential weights algorithm over the probabilitysimplex, which we analyse by exploiting a novel connection to the linearbandit setting without contexts.

Cite

Text

Olkhovskaya et al. "First- and Second-Order Bounds for Adversarial Linear Contextual Bandits." Neural Information Processing Systems, 2023.

Markdown

[Olkhovskaya et al. "First- and Second-Order Bounds for Adversarial Linear Contextual Bandits." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/olkhovskaya2023neurips-first/)

BibTeX

@inproceedings{olkhovskaya2023neurips-first,
  title     = {{First- and Second-Order Bounds for Adversarial Linear Contextual Bandits}},
  author    = {Olkhovskaya, Julia and Mayo, Jack and van Erven, Tim and Neu, Gergely and Wei, Chen-Yu},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/olkhovskaya2023neurips-first/}
}