Piecewise-Stationary Bandits with Knapsacks

Abstract

We study Bandits with Knapsacks (Bwk) in a piecewise-stationary environment. We propose a novel inventory reserving algorithm which draws new insights into the problem. Suppose parameters $\eta_{\min}, \eta_{\max} \in (0,1]$ respectively lower and upper bound the reward earned and the resources consumed in a time round. Our algorithm achieves a provably near-optimal competitive ratio of $O(\log(\eta_{\max}/\eta_{\min}))$, with a matching lower bound provided. Our performance guarantee is based on a dynamic benchmark, distinguishing our work from existing works on adversarial Bwk who compare with the static benchmark. Furthermore, different from existing non-stationary Bwk work, we do not require a bounded global variation.

Cite

Text

Zhang and Chi. "Piecewise-Stationary Bandits with Knapsacks." Neural Information Processing Systems, 2024. doi:10.52202/079017-1781

Markdown

[Zhang and Chi. "Piecewise-Stationary Bandits with Knapsacks." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/zhang2024neurips-piecewisestationary/) doi:10.52202/079017-1781

BibTeX

@inproceedings{zhang2024neurips-piecewisestationary,
  title     = {{Piecewise-Stationary Bandits with Knapsacks}},
  author    = {Zhang, Xilin and Chi, Cheung Wang},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-1781},
  url       = {https://mlanthology.org/neurips/2024/zhang2024neurips-piecewisestationary/}
}