Stay with Me: Lifetime Maximization Through Heteroscedastic Linear Bandits with Reneging

Abstract

Sequential decision making for lifetime maximization is a critical problem in many real-world applications, such as medical treatment and portfolio selection. In these applications, a “reneging” phenomenon, where participants may disengage from future interactions after observing an unsatisfiable outcome, is rather prevalent. To address the above issue, this paper proposes a model of heteroscedastic linear bandits with reneging, which allows each participant to have a distinct “satisfaction level," with any interaction outcome falling short of that level resulting in that participant reneging. Moreover, it allows the variance of the outcome to be context-dependent. Based on this model, we develop a UCB-type policy, namely HR-UCB, and prove that it achieves $\mathcal{O}\big(\sqrt{{T}(\log({T}))^{3}}\big)$ regret. Finally, we validate the performance of HR-UCB via simulations.

Cite

Text

Hsieh et al. "Stay with Me: Lifetime Maximization Through Heteroscedastic Linear Bandits with Reneging." International Conference on Machine Learning, 2019.

Markdown

[Hsieh et al. "Stay with Me: Lifetime Maximization Through Heteroscedastic Linear Bandits with Reneging." International Conference on Machine Learning, 2019.](https://mlanthology.org/icml/2019/hsieh2019icml-stay/)

BibTeX

@inproceedings{hsieh2019icml-stay,
  title     = {{Stay with Me: Lifetime Maximization Through Heteroscedastic Linear Bandits with Reneging}},
  author    = {Hsieh, Ping-Chun and Liu, Xi and Bhattacharya, Anirban and Kumar, P R},
  booktitle = {International Conference on Machine Learning},
  year      = {2019},
  pages     = {2800-2809},
  volume    = {97},
  url       = {https://mlanthology.org/icml/2019/hsieh2019icml-stay/}
}