Non-Stationary Bandits with Knapsacks

Abstract

In this paper, we study the problem of bandits with knapsacks (BwK) in a non-stationary environment. The BwK problem generalizes the multi-arm bandit (MAB) problem to model the resource consumption associated with playing each arm. At each time, the decision maker/player chooses to play an arm, and s/he will receive a reward and consume certain amount of resource from each of the multiple resource types. The objective is to maximize the cumulative reward over a finite horizon subject to some knapsack constraints on the resources. Existing works study the BwK problem under either a stochastic or adversarial environment. Our paper considers a non-stationary environment which continuously interpolates between these two extremes. We first show that the traditional notion of variation budget is insufficient to characterize the non-stationarity of the BwK problem for a sublinear regret due to the presence of the constraints, and then we propose a new notion of global non-stationarity measure. We employ both non-stationarity measures to derive upper and lower bounds for the problem. Our results are based on a primal-dual analysis of the underlying linear programs and highlight the interplay between the constraints and the non-stationarity. Finally, we also extend the non-stationarity measure to the problem of online convex optimization with constraints and obtain new regret bounds accordingly.

Cite

Text

Liu et al. "Non-Stationary Bandits with Knapsacks." Neural Information Processing Systems, 2022.

Markdown

[Liu et al. "Non-Stationary Bandits with Knapsacks." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/liu2022neurips-nonstationary-a/)

BibTeX

@inproceedings{liu2022neurips-nonstationary-a,
  title     = {{Non-Stationary Bandits with Knapsacks}},
  author    = {Liu, Shang and Jiang, Jiashuo and Li, Xiaocheng},
  booktitle = {Neural Information Processing Systems},
  year      = {2022},
  url       = {https://mlanthology.org/neurips/2022/liu2022neurips-nonstationary-a/}
}