Dynamic Planning and Learning Under Recovering Rewards

Abstract

Motivated by emerging applications such as live-streaming e-commerce, promotions and recommendations, we introduce a general class of multi-armed bandit problems that have the following two features: (i) the decision maker can pull and collect rewards from at most $K$ out of $N$ different arms in each time period; (ii) the expected reward of an arm immediately drops after it is pulled, and then non-parametrically recovers as the idle time increases. With the objective of maximizing expected cumulative rewards over $T$ time periods, we propose, construct and prove performance guarantees for a class of “Purely Periodic Policies”. For the offline problem when all model parameters are known, our proposed policy obtains an approximation ratio that is at the order of $1-\mathcal O(1/\sqrt{K})$, which is asymptotically optimal when $K$ grows to infinity. For the online problem when the model parameters are unknown and need to be learned, we design an Upper Confidence Bound (UCB) based policy that approximately has $\widetilde{\mathcal O}(N\sqrt{T})$ regret against the offline benchmark. Our framework and policy design may have the potential to be adapted into other offline planning and online learning applications with non-stationary and recovering rewards.

Cite

Text

Simchi-Levi et al. "Dynamic Planning and Learning Under Recovering Rewards." International Conference on Machine Learning, 2021.

Markdown

[Simchi-Levi et al. "Dynamic Planning and Learning Under Recovering Rewards." International Conference on Machine Learning, 2021.](https://mlanthology.org/icml/2021/simchilevi2021icml-dynamic/)

BibTeX

@inproceedings{simchilevi2021icml-dynamic,
  title     = {{Dynamic Planning and Learning Under Recovering Rewards}},
  author    = {Simchi-Levi, David and Zheng, Zeyu and Zhu, Feng},
  booktitle = {International Conference on Machine Learning},
  year      = {2021},
  pages     = {9702-9711},
  volume    = {139},
  url       = {https://mlanthology.org/icml/2021/simchilevi2021icml-dynamic/}
}