Regret Bounds for Satisficing in Multi-Armed Bandit Problems
Abstract
This paper considers the objective of \textit{satisficing} in multi-armed bandit problems. Instead of aiming to find an optimal arm, the learner is content with an arm whose reward is above a given satisfaction level. We provide algorithms and analysis for the realizable case when such a satisficing arm exists as well as for the general case when this may not be the case. Introducing the notion of \textit{satisficing regret}, our main result shows that in the general case it is possible to obtain constant satisficing regret when there is a satisficing arm (thereby correcting a contrary claim in the literature), while standard logarithmic regret bounds can be re-established otherwise. Experiments illustrate that our algorithm is not only superior to standard algorithms in the satisficing setting, but also works well in the classic bandit setting.
Cite
Text
Michel et al. "Regret Bounds for Satisficing in Multi-Armed Bandit Problems." Transactions on Machine Learning Research, 2023.Markdown
[Michel et al. "Regret Bounds for Satisficing in Multi-Armed Bandit Problems." Transactions on Machine Learning Research, 2023.](https://mlanthology.org/tmlr/2023/michel2023tmlr-regret/)BibTeX
@article{michel2023tmlr-regret,
title = {{Regret Bounds for Satisficing in Multi-Armed Bandit Problems}},
author = {Michel, Thomas and Hajiabolhassan, Hossein and Ortner, Ronald},
journal = {Transactions on Machine Learning Research},
year = {2023},
url = {https://mlanthology.org/tmlr/2023/michel2023tmlr-regret/}
}