Regret Bound for Safe Gaussian Process Bandit Optimization
Abstract
Many applications require a learner to make sequential decisions given uncertainty regarding both the system’s payoff function and safety constraints. When learning algorithms are used in safety-critical systems, it is paramount that the learner’s actions do not violate the safety constraints at any stage of the learning process. In this paper, we study a stochastic bandit optimization problem where the system’s unknown payoff and constraint functions are sampled from Gaussian Processes (GPs). We develop a safe variant of the proposed algorithm by Srinivas et al. (2010), GP-UCB, called SGP-UCB, with necessary modifications to respect safety constraints at every round. Our most important contribution is to derive the first sub-linear regret bounds for this problem.
Cite
Text
Amani et al. "Regret Bound for Safe Gaussian Process Bandit Optimization." Proceedings of the 2nd Conference on Learning for Dynamics and Control, 2020.Markdown
[Amani et al. "Regret Bound for Safe Gaussian Process Bandit Optimization." Proceedings of the 2nd Conference on Learning for Dynamics and Control, 2020.](https://mlanthology.org/l4dc/2020/amani2020l4dc-regret/)BibTeX
@inproceedings{amani2020l4dc-regret,
title = {{Regret Bound for Safe Gaussian Process Bandit Optimization}},
author = {Amani, Sanae and Alizadeh, Mahnoosh and Thrampoulidis, Christos},
booktitle = {Proceedings of the 2nd Conference on Learning for Dynamics and Control},
year = {2020},
pages = {158-159},
volume = {120},
url = {https://mlanthology.org/l4dc/2020/amani2020l4dc-regret/}
}