Achieving near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously

Abstract

In this work, we develop linear bandit algorithms that automatically adapt to different environments. By plugging a novel loss estimator into the optimization problem that characterizes the instance-optimal strategy, our first algorithm not only achieves nearly instance-optimal regret in stochastic environments, but also works in corrupted environments with additional regret being the amount of corruption, while the state-of-the-art (Li et al., 2019) achieves neither instance-optimality nor the optimal dependence on the corruption amount. Moreover, by equipping this algorithm with an adversarial component and carefully-designed testings, our second algorithm additionally enjoys minimax-optimal regret in completely adversarial environments, which is the first of this kind to our knowledge. Finally, all our guarantees hold with high probability, while existing instance-optimal guarantees only hold in expectation.

Cite

Text

Lee et al. "Achieving near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously." International Conference on Machine Learning, 2021.

Markdown

[Lee et al. "Achieving near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously." International Conference on Machine Learning, 2021.](https://mlanthology.org/icml/2021/lee2021icml-achieving/)

BibTeX

@inproceedings{lee2021icml-achieving,
  title     = {{Achieving near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously}},
  author    = {Lee, Chung-Wei and Luo, Haipeng and Wei, Chen-Yu and Zhang, Mengxiao and Zhang, Xiaojin},
  booktitle = {International Conference on Machine Learning},
  year      = {2021},
  pages     = {6142-6151},
  volume    = {139},
  url       = {https://mlanthology.org/icml/2021/lee2021icml-achieving/}
}