Conservative Contextual Bandits: Beyond Linear Representations
Abstract
Conservative Contextual Bandits (CCBs) address safety in sequential decision making by requiring that an agent's policy, along with minimizing regret, also satisfies a safety constraint: the performance is not worse than a baseline policy (e.g., the policy that the company has in production) by more than $(1+\alpha)$ factor. Prior work developed UCB-style algorithms for this problem in the multi-armed (Wu et al., 2016) and contextual linear (Kazerouni et al., 2017) settings. However, in practice the cost of the arms is often a non-linear function, and therefore existing UCB algorithms are ineffective in such settings. In this paper, we consider CCBs beyond the linear case and develop two algorithms $\mathtt{C\text{-}SquareCB}$ and $\mathtt{C\text{-}FastCB}$, using Inverse Gap Weighting (IGW) based exploration and an online regression oracle. We show that the safety constraint is satisfied in high probability and that the regret for $\mathtt{C\text{-}SquareCB}$ is sub-linear in horizon $T$, while the the regret for $\mathtt{C\text{-}FastCB}$ is first-order and is sub-linear in $L^*$, the cumulative loss of the optimal policy. Subsequently, we use a neural network for function approximation and online gradient descent as the regression oracle to provide $\tilde{\mathcal{O}}\big(\sqrt{KT} + K/\alpha\big) $ and $\tilde{\mathcal{O}}\big(\sqrt{KL^*} + K (1 + 1/\alpha)\big)$ regret bounds respectively. Finally, we demonstrate the efficacy of our algorithms on real world data, and show that they significantly outperform the existing baseline while maintaining the performance guarantee.
Cite
Text
Deb et al. "Conservative Contextual Bandits: Beyond Linear Representations." International Conference on Learning Representations, 2025.Markdown
[Deb et al. "Conservative Contextual Bandits: Beyond Linear Representations." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/deb2025iclr-conservative/)BibTeX
@inproceedings{deb2025iclr-conservative,
title = {{Conservative Contextual Bandits: Beyond Linear Representations}},
author = {Deb, Rohan and Ghavamzadeh, Mohammad and Banerjee, Arindam},
booktitle = {International Conference on Learning Representations},
year = {2025},
url = {https://mlanthology.org/iclr/2025/deb2025iclr-conservative/}
}