Stochastic Bandits with ReLU Neural Networks

Kan Xu, Hamsa Bastani, Surbhi Goel, Osbert Bastani

ICML 2024 pp. 54866-54887

/icml/2024/xu2024icml-stochastic/

Abstract

We study the stochastic bandit problem with ReLU neural network structure. We show that a $\tilde{O}(\sqrt{T})$ regret guarantee is achievable by considering bandits with one-layer ReLU neural networks; to the best of our knowledge, our work is the first to achieve such a guarantee. In this specific setting, we propose an OFU-ReLU algorithm that can achieve this upper bound. The algorithm first explores randomly until it reaches a linear regime, and then implements a UCB-type linear bandit algorithm to balance exploration and exploitation. Our key insight is that we can exploit the piecewise linear structure of ReLU activations and convert the problem into a linear bandit in a transformed feature space, once we learn the parameters of ReLU relatively accurately during the exploration stage. To remove dependence on model parameters, we design an OFU-ReLU+ algorithm based on a batching strategy, which can provide the same theoretical guarantee.

PDF ICML OpenReview Semantic Scholar

Cite

Text

Xu et al. "Stochastic Bandits with ReLU Neural Networks." International Conference on Machine Learning, 2024.

Markdown

[Xu et al. "Stochastic Bandits with ReLU Neural Networks." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/xu2024icml-stochastic/)

BibTeX

@inproceedings{xu2024icml-stochastic,
  title     = {{Stochastic Bandits with ReLU Neural Networks}},
  author    = {Xu, Kan and Bastani, Hamsa and Goel, Surbhi and Bastani, Osbert},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {54866-54887},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/xu2024icml-stochastic/}
}