On Preference-Based Stochastic Linear Contextual Bandits with Knapsacks

Abstract

This paper studies the problem of preference-based stochastic linear contextual bandits with knapsack constraints (PbLinCBwK). We propose budget-aware optimistic and randomized exploration algorithms that achieve a regret of ${O}((\kappa+\frac{T\nu^*}{B})\sqrt{T}\log T),$ for any total budget $B=\Omega(\sqrt{T}).$ The parameters $\kappa$ and $\frac{T\nu^*}{B}$ capture the effects of preference feedback and knapsack constraints, respectively. Our regret performance is near-optimal and matches the bound of LinCBwK under the mild condition $B=\Omega(\sqrt{T}).$ To achieve these results, we view the process of budget consumption and stopping time as Markov processing and analyze it via the Lyapunov drift method, which is translated into the strong regret guarantee. The experiments on synthetic PbLinCBwK and online content moderation setting further justify the theoretical results.

Cite

Text

Liu. "On Preference-Based Stochastic Linear Contextual Bandits with Knapsacks." Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 2025.

Markdown

[Liu. "On Preference-Based Stochastic Linear Contextual Bandits with Knapsacks." Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 2025.](https://mlanthology.org/aistats/2025/liu2025aistats-preferencebased/)

BibTeX

@inproceedings{liu2025aistats-preferencebased,
  title     = {{On Preference-Based Stochastic Linear Contextual Bandits with Knapsacks}},
  author    = {Liu, Xin},
  booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics},
  year      = {2025},
  pages     = {2890-2898},
  volume    = {258},
  url       = {https://mlanthology.org/aistats/2025/liu2025aistats-preferencebased/}
}