High-Dimensional Sparse Linear Bandits

Abstract

Stochastic linear bandits with high-dimensional sparse features are a practical model for a variety of domains, such as personalized medicine and online advertising. We derive a novel O(n^2/3) dimension-free minimax regret lower bound for sparse linear bandits in the data-poor regime where the horizon is larger than the ambient dimension and where the feature vectors admit a well-conditioned exploration distribution. This is complemented by a nearly matching upper bound for an explore-then-commit algorithm showing that that O(n^2/3) is the optimal rate in the data-poor regime. The results complement existing bounds for the data-rich regime and also provide another example where carefully balancing the trade-off between information and regret is necessary. Finally, we prove a dimension-free O(\sqrt{n}) regret upper bound under an additional assumption on the magnitude of the signal for relevant features.

Cite

Text

Hao et al. "High-Dimensional Sparse Linear Bandits." Neural Information Processing Systems, 2020.

Markdown

[Hao et al. "High-Dimensional Sparse Linear Bandits." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/hao2020neurips-highdimensional/)

BibTeX

@inproceedings{hao2020neurips-highdimensional,
  title     = {{High-Dimensional Sparse Linear Bandits}},
  author    = {Hao, Botao and Lattimore, Tor and Wang, Mengdi},
  booktitle = {Neural Information Processing Systems},
  year      = {2020},
  url       = {https://mlanthology.org/neurips/2020/hao2020neurips-highdimensional/}
}