An Exploration-by-Optimization Approach to Best of Both Worlds in Linear Bandits

Abstract

In this paper, we consider how to construct best-of-both-worlds linear bandit algorithms that achieve nearly optimal performance for both stochastic and adversarial environments. For this purpose, we show that a natural approach referred to as exploration by optimization [Lattimore and Szepesvári, 2020] works well. Specifically, an algorithm constructed using this approach achieves $O(d \sqrt{ T \log{T}})$-regret in adversarial environments and $O(\frac{d^2 \log T}{\Delta_{\min}} )$-regret in stochastic environments. Symbols $d$, $T$ and $\Delta_{\min}$ here represent the dimensionality of the action set, the time horizon, and the minimum sub-optimality gap, respectively. We also show that this algorithm has even better theoretical guarantees for important special cases including the multi-armed bandit problem and multitask bandits.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Ito and Takemura. "An Exploration-by-Optimization Approach to Best of Both Worlds in Linear Bandits." Neural Information Processing Systems, 2023.

Markdown

[Ito and Takemura. "An Exploration-by-Optimization Approach to Best of Both Worlds in Linear Bandits." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/ito2023neurips-explorationbyoptimization/)

BibTeX

@inproceedings{ito2023neurips-explorationbyoptimization,
  title     = {{An Exploration-by-Optimization Approach to Best of Both Worlds in Linear Bandits}},
  author    = {Ito, Shinji and Takemura, Kei},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/ito2023neurips-explorationbyoptimization/}
}