Thresholded Linear Bandits

Abstract

We introduce the thresholded linear bandit problem, a novel sequential decision making problem at the interface of structured stochastic multi-armed bandits and learning halfspaces. The set of arms is $[0, 1]^d$, the expected Bernoulli reward is piecewise constant with a jump at a separating hyperplane, and each arm is associated with a cost that is a positive linear combination of the arm’s components. This problem is motivated by several practical applications. For instance, imagine tuning the continuous features of an offer to a consumer; higher values incur higher cost to the vendor but result in a more attractive offer. At some threshold, the offer is attractive enough for a random consumer to accept at the higher probability level. For the one-dimensional case, we present Leftist, which enjoys $\log^2 T$ problem-dependent regret in favorable cases and has $\log(T) \sqrt{T}$ worst-case regret; we also give a lower bound suggesting this is unimprovable. We then present MD-Leftist, our extension of Leftist to the multi-dimensional case, which obtains similar regret bounds but with $d^{2.5} \log d$ and $d^{1.5} \log d$ dependence on dimension for the two types of bounds respectively. Finally, we experimentally evaluate Leftist.

Cite

Text

Mehta et al. "Thresholded Linear Bandits." Artificial Intelligence and Statistics, 2023.

Markdown

[Mehta et al. "Thresholded Linear Bandits." Artificial Intelligence and Statistics, 2023.](https://mlanthology.org/aistats/2023/mehta2023aistats-thresholded/)

BibTeX

@inproceedings{mehta2023aistats-thresholded,
  title     = {{Thresholded Linear Bandits}},
  author    = {Mehta, Nishant A. and Komiyama, Junpei and Potluru, Vamsi K. and Nguyen, Andrea and Grant-Hagen, Mica},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2023},
  pages     = {6968-7020},
  volume    = {206},
  url       = {https://mlanthology.org/aistats/2023/mehta2023aistats-thresholded/}
}