Linear Bandits on Uniformly Convex Sets

Abstract

Linear bandit algorithms yield $\tilde{\mathcal{O}}(n\sqrt{T})$ pseudo-regret bounds on compact convex action sets $\mathcal{K}\subset\mathbb{R}^n$ and two types of structural assumptions lead to better pseudo-regret bounds. When $\mathcal{K}$ is the simplex or an $\ell_p$ ball with $p\in]1,2]$, there exist bandits algorithms with $\tilde{\mathcal{O}}(\sqrt{nT})$ pseudo-regret bounds. Here, we derive bandit algorithms for some strongly convex sets beyond $\ell_p$ balls that enjoy pseudo-regret bounds of $\tilde{\mathcal{O}}(\sqrt{nT})$. This result provides new elements for the open question in Bubeck and Cesa-Bianchi, 2012. When the action set is $q$-uniformly convex but not necessarily strongly convex ($q >2$), we obtain pseudo-regret bounds $\tilde{\mathcal{O}}(n^{1/q}T^{1/p})$ with $p$ s.t. $1/p + 1/q=1$. These pseudo-regret bounds are competitive with the general $\tilde{\mathcal{O}}(n\sqrt{T})$ for a time horizon range that depends on the degree $q>2$ of the set's uniform convexity and the dimension $n$ of the problem.

Cite

Text

Kerdreux et al. "Linear Bandits on Uniformly Convex Sets." Journal of Machine Learning Research, 2021.

Markdown

[Kerdreux et al. "Linear Bandits on Uniformly Convex Sets." Journal of Machine Learning Research, 2021.](https://mlanthology.org/jmlr/2021/kerdreux2021jmlr-linear/)

BibTeX

@article{kerdreux2021jmlr-linear,
  title     = {{Linear Bandits on Uniformly Convex Sets}},
  author    = {Kerdreux, Thomas and Roux, Christophe and d'Aspremont, Alexandre and Pokutta, Sebastian},
  journal   = {Journal of Machine Learning Research},
  year      = {2021},
  pages     = {1-23},
  volume    = {22},
  url       = {https://mlanthology.org/jmlr/2021/kerdreux2021jmlr-linear/}
}