Noise-Adaptive Thompson Sampling for Linear Contextual Bandits
Abstract
Linear contextual bandits represent a fundamental class of models with numerous real-world applications, and it is critical to develop algorithms that can effectively manage noise with unknown variance, ensuring provable guarantees for both worst-case constant-variance noise and deterministic reward scenarios. In this paper, we study linear contextual bandits with heteroscedastic noise and propose the first noise-adaptive Thompson sampling-style algorithm that achieves a variance-dependent regret upper bound of $\widetilde O\Big(d^{3/2} + d^{3/2} \sqrt{\sum_{t=1}^T \sigma_t^2}\Big)$, where $d$ is the dimension of the context vectors and $\sigma_t^2$ is the variance of the reward in round $t$. This recovers the existing $\widetilde O(d^{3/2}\sqrt{T})$ regret guarantee in the constant-variance regime and further improves to $\widetilde O(d^{3/2})$ in the deterministic regime, thus achieving a smooth interpolation in between. Our approach utilizes a stratified sampling procedure to overcome the too-conservative optimism in the linear Thompson sampling algorithm for linear contextual bandits.
Cite
Text
Xu et al. "Noise-Adaptive Thompson Sampling for Linear Contextual Bandits." Neural Information Processing Systems, 2023.Markdown
[Xu et al. "Noise-Adaptive Thompson Sampling for Linear Contextual Bandits." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/xu2023neurips-noiseadaptive/)BibTeX
@inproceedings{xu2023neurips-noiseadaptive,
title = {{Noise-Adaptive Thompson Sampling for Linear Contextual Bandits}},
author = {Xu, Ruitu and Min, Yifei and Wang, Tianhao},
booktitle = {Neural Information Processing Systems},
year = {2023},
url = {https://mlanthology.org/neurips/2023/xu2023neurips-noiseadaptive/}
}