Expected Improvement for Contextual Bandits

Abstract

The expected improvement (EI) is a popular technique to handle the tradeoff between exploration and exploitation under uncertainty. This technique has been widely used in Bayesian optimization but it is not applicable for the contextual bandit problem which is a generalization of the standard bandit and Bayesian optimization. In this paper, we initiate and study the EI technique for contextual bandits from both theoretical and practical perspectives. We propose two novel EI-based algorithms, one when the reward function is assumed to be linear and the other for more general reward functions. With linear reward functions, we demonstrate that our algorithm achieves a near-optimal regret. Notably, our regret improves that of LinTS \cite{agrawal13} by a factor $\sqrt{d}$ while avoiding to solve a NP-hard problem at each iteration as in LinUCB \cite{Abbasi11}. For more general reward functions which are modeled by deep neural networks, we prove that our algorithm achieves a $\tilde{\mathcal O} (\tilde{d}\sqrt{T})$ regret, where $\tilde{d}$ is the effective dimension of a neural tangent kernel (NTK) matrix, and $T$ is the number of iterations. Our experiments on various benchmark datasets show that both proposed algorithms work well and consistently outperform existing approaches, especially in high dimensions.

Cite

Text

Tran-The et al. "Expected Improvement for Contextual Bandits." Neural Information Processing Systems, 2022.

Markdown

[Tran-The et al. "Expected Improvement for Contextual Bandits." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/tranthe2022neurips-expected/)

BibTeX

@inproceedings{tranthe2022neurips-expected,
  title     = {{Expected Improvement for Contextual Bandits}},
  author    = {Tran-The, Hung and Gupta, Sunil and Rana, Santu and Truong, Tuan and Tran-Thanh, Long and Venkatesh, Svetha},
  booktitle = {Neural Information Processing Systems},
  year      = {2022},
  url       = {https://mlanthology.org/neurips/2022/tranthe2022neurips-expected/}
}