Bandits with Knapsacks and Predictions

Abstract

We study the Bandits with Knapsacks problem with the aim of designing a learning-augmented online learning algorithm upholding better regret guarantees than the state-of-the-art primal-dual algorithms with worst-case guarantees, under both stochastic and adversarial inputs. In the adversarial case, we obtain better competitive ratios when the input predictions are accurate, while also maintaining worst-case guarantees for imprecise predictions. We introduce two algorithms tailored for the full and bandit feedback settings, respectively. Both algorithms integrate a static prediction with a worst-case no-$\alpha$-regret algorithm. This yields an optimized competitive ratio of $(\pi + (1 -\pi)/\alpha)^{-1}$ in scenarios where the prediction is perfect, and a competitive ratio of $\alpha/(1 - \pi)$ in the case of highly imprecise predictions, where $\pi \in (0,1)$ is chosen by the learner and $\alpha$ is Slater’s parameter. We complement this analysis by studying the stochastic setting under full feedback. We provide an algorithm which guarantees a pseudo-regret of $\widetilde{O}(\sqrt{T})$ with poor predictions, and 0 pseudo-regret with perfect predictions. We also characterize the smoothness of the algorithm.

Cite

Text

Drago et al. "Bandits with Knapsacks and Predictions." Uncertainty in Artificial Intelligence, 2024.

Markdown

[Drago et al. "Bandits with Knapsacks and Predictions." Uncertainty in Artificial Intelligence, 2024.](https://mlanthology.org/uai/2024/drago2024uai-bandits/)

BibTeX

@inproceedings{drago2024uai-bandits,
  title     = {{Bandits with Knapsacks and Predictions}},
  author    = {Drago, Davide and Celli, Andrea and Elias, Marek},
  booktitle = {Uncertainty in Artificial Intelligence},
  year      = {2024},
  pages     = {1189-1206},
  volume    = {244},
  url       = {https://mlanthology.org/uai/2024/drago2024uai-bandits/}
}