Linear Bandits on Uniformly Convex Sets
Abstract
Linear bandit algorithms yield $\tilde{\mathcal{O}}(n\sqrt{T})$ pseudo-regret bounds on compact convex action sets $\mathcal{K}\subset\mathbb{R}^n$ and two types of structural assumptions lead to better pseudo-regret bounds. When $\mathcal{K}$ is the simplex or an $\ell_p$ ball with $p\in]1,2]$, there exist bandits algorithms with $\tilde{\mathcal{O}}(\sqrt{nT})$ pseudo-regret bounds. Here, we derive bandit algorithms for some strongly convex sets beyond $\ell_p$ balls that enjoy pseudo-regret bounds of $\tilde{\mathcal{O}}(\sqrt{nT})$. This result provides new elements for the open question in Bubeck and Cesa-Bianchi, 2012. When the action set is $q$-uniformly convex but not necessarily strongly convex ($q >2$), we obtain pseudo-regret bounds $\tilde{\mathcal{O}}(n^{1/q}T^{1/p})$ with $p$ s.t. $1/p + 1/q=1$. These pseudo-regret bounds are competitive with the general $\tilde{\mathcal{O}}(n\sqrt{T})$ for a time horizon range that depends on the degree $q>2$ of the set's uniform convexity and the dimension $n$ of the problem.
Cite
Text
Kerdreux et al. "Linear Bandits on Uniformly Convex Sets." Journal of Machine Learning Research, 2021.Markdown
[Kerdreux et al. "Linear Bandits on Uniformly Convex Sets." Journal of Machine Learning Research, 2021.](https://mlanthology.org/jmlr/2021/kerdreux2021jmlr-linear/)BibTeX
@article{kerdreux2021jmlr-linear,
title = {{Linear Bandits on Uniformly Convex Sets}},
author = {Kerdreux, Thomas and Roux, Christophe and d'Aspremont, Alexandre and Pokutta, Sebastian},
journal = {Journal of Machine Learning Research},
year = {2021},
pages = {1-23},
volume = {22},
url = {https://mlanthology.org/jmlr/2021/kerdreux2021jmlr-linear/}
}