Stochastic Constrained Contextual Bandits via Lyapunov Optimization Based Estimation to Decision Framework

Abstract

This paper studies the problem of stochastic constrained contextual bandits (CCB) under general realizability condition where the expected rewards and costs are within general function classes. We propose LOE2D, a Lyapunov Optimization Based Estimation to Decision framework with online regression oracles for learning reward/constraint. LOE2D establishes $\Tilde O(T^{\frac{3}{4}}U^{\frac{1}{4}})$ regret and constraint violation, which can be further refined to $\Tilde O(\min\{\sqrt{TU}/\varepsilon^2, T^{\frac{3}{4}}U^{\frac{1}{4}}\})$ when the Slater condition holds in the underlying offline problem with the Slater “constant” $ \varepsilon=\Omega(\sqrt{U/T}),$ where $U$ denotes the error bounds of online regression oracles. These results improve LagrangeCBwLC in two aspects: i) our results hold without any prior information while LagrangeCBwLC requires the knowledge of Slater constant to design a proper learning rate; ii) our results hold when $\varepsilon=\Omega(\sqrt{U/T})$ while LagrangeCBwLC requires a constant margin $\varepsilon=\Omega(1).$ These improvements stem from two novel techniques: violation-adaptive learning in E2D module and multi-step Lyapunov drift analysis in bounding constraint violation. The experiments further justify LOE2D outperforms the baseline algorithm.

Cite

Text

Guo and Liu. "Stochastic Constrained Contextual Bandits via Lyapunov Optimization Based Estimation to Decision Framework." Conference on Learning Theory, 2024.

Markdown

[Guo and Liu. "Stochastic Constrained Contextual Bandits via Lyapunov Optimization Based Estimation to Decision Framework." Conference on Learning Theory, 2024.](https://mlanthology.org/colt/2024/guo2024colt-stochastic/)

BibTeX

@inproceedings{guo2024colt-stochastic,
  title     = {{Stochastic Constrained Contextual Bandits via Lyapunov Optimization Based Estimation to Decision Framework}},
  author    = {Guo, Hengquan and Liu, Xin},
  booktitle = {Conference on Learning Theory},
  year      = {2024},
  pages     = {2204-2231},
  volume    = {247},
  url       = {https://mlanthology.org/colt/2024/guo2024colt-stochastic/}
}