Improved Pure Exploration in Linear Bandits with No-Regret Learning
Abstract
We study the best arm identification problem in linear multi-armed bandits (LMAB) in the fixed confidence setting ; this is also the problem of optimizing an unknown linear function over a discrete domain with noisy, zeroth-order access. We propose an explicitly implementable and provably order-optimal sample-complexity algorithm for best arm identification. Existing approaches that achieve optimal sample complexity assume either access to a nontrivial minimax optimization oracle (e.g. RAGE and Lazy-TS) or knowledge of an upper-bound on the norm of the unknown parameter vector(e.g. LinGame). Our algorithm, which we call the Phased Elimination Linear Exploration Game (PELEG), maintains a high-probability confidence ellipsoid containing the unknown vector, and uses it to eliminate suboptimal arms in phases, where a minimax problem is essentially solved in each phase using two-player low regret learners. PELEG does not require to know a bound on norm of the unknown vector, and is asymptotically-optimal, settling an open problem. We show that the sample complexity of PELEG matches, up to order and in a non-asymptotic sense, an instance-dependent universal lower bound for linear bandits. PELEG is thus the first algorithm to achieve both order-optimal sample complexity and explicit implementability for this setting. We also provide numerical results for the proposed algorithm consistent with its theoretical guarantees.
Cite
Text
Zaki et al. "Improved Pure Exploration in Linear Bandits with No-Regret Learning." International Joint Conference on Artificial Intelligence, 2022. doi:10.24963/IJCAI.2022/515Markdown
[Zaki et al. "Improved Pure Exploration in Linear Bandits with No-Regret Learning." International Joint Conference on Artificial Intelligence, 2022.](https://mlanthology.org/ijcai/2022/zaki2022ijcai-improved/) doi:10.24963/IJCAI.2022/515BibTeX
@inproceedings{zaki2022ijcai-improved,
title = {{Improved Pure Exploration in Linear Bandits with No-Regret Learning}},
author = {Zaki, Mohammadi and Mohan, Avi and Gopalan, Aditya},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2022},
pages = {3709-3715},
doi = {10.24963/IJCAI.2022/515},
url = {https://mlanthology.org/ijcai/2022/zaki2022ijcai-improved/}
}