Online Learning for Active Cache Synchronization
Abstract
Existing multi-armed bandit (MAB) models make two implicit assumptions: an arm generates a payoff only when it is played, and the agent observes every payoff that is generated. This paper introduces synchronization bandits, a MAB variant where all arms generate costs at all times, but the agent observes an arm’s instantaneous cost only when the arm is played. Synchronization MABs are inspired by online caching scenarios such as Web crawling, where an arm corresponds to a cached item and playing the arm means downloading its fresh copy from a server. We present MirrorSync, an online learning algorithm for synchronization bandits, establish an adversarial regret of $O(T^{2/3})$ for it, and show how to make it practical.
Cite
Text
Kolobov et al. "Online Learning for Active Cache Synchronization." International Conference on Machine Learning, 2020.Markdown
[Kolobov et al. "Online Learning for Active Cache Synchronization." International Conference on Machine Learning, 2020.](https://mlanthology.org/icml/2020/kolobov2020icml-online/)BibTeX
@inproceedings{kolobov2020icml-online,
title = {{Online Learning for Active Cache Synchronization}},
author = {Kolobov, Andrey and Bubeck, Sebastien and Zimmert, Julian},
booktitle = {International Conference on Machine Learning},
year = {2020},
pages = {5371-5380},
volume = {119},
url = {https://mlanthology.org/icml/2020/kolobov2020icml-online/}
}