Piecewise-Stationary Bandits with Knapsacks
Abstract
We study Bandits with Knapsacks (Bwk) in a piecewise-stationary environment. We propose a novel inventory reserving algorithm which draws new insights into the problem. Suppose parameters $\eta_{\min}, \eta_{\max} \in (0,1]$ respectively lower and upper bound the reward earned and the resources consumed in a time round. Our algorithm achieves a provably near-optimal competitive ratio of $O(\log(\eta_{\max}/\eta_{\min}))$, with a matching lower bound provided. Our performance guarantee is based on a dynamic benchmark, distinguishing our work from existing works on adversarial Bwk who compare with the static benchmark. Furthermore, different from existing non-stationary Bwk work, we do not require a bounded global variation.
Cite
Text
Zhang and Chi. "Piecewise-Stationary Bandits with Knapsacks." Neural Information Processing Systems, 2024. doi:10.52202/079017-1781Markdown
[Zhang and Chi. "Piecewise-Stationary Bandits with Knapsacks." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/zhang2024neurips-piecewisestationary/) doi:10.52202/079017-1781BibTeX
@inproceedings{zhang2024neurips-piecewisestationary,
title = {{Piecewise-Stationary Bandits with Knapsacks}},
author = {Zhang, Xilin and Chi, Cheung Wang},
booktitle = {Neural Information Processing Systems},
year = {2024},
doi = {10.52202/079017-1781},
url = {https://mlanthology.org/neurips/2024/zhang2024neurips-piecewisestationary/}
}