Online Learning with Bounded Recall

Abstract

We study the problem of full-information online learning in the “bounded recall” setting popular in the study of repeated games. An online learning algorithm $\mathcal{A}$ is $M$-bounded-recall if its output at time $t$ can be written as a function of the $M$ previous rewards (and not e.g. any other internal state of $\mathcal{A}$). We first demonstrate that a natural approach to constructing bounded-recall algorithms from mean-based no-regret learning algorithms (e.g., running Hedge over the last $M$ rounds) fails, and that any such algorithm incurs constant regret per round. We then construct a stationary bounded-recall algorithm that achieves a per-round regret of $\Theta(1/\sqrt{M})$, which we complement with a tight lower bound. Finally, we show that unlike the perfect recall setting, any low regret bound bounded-recall algorithm must be aware of the ordering of the past $M$ losses – any bounded-recall algorithm which plays a symmetric function of the past $M$ losses must incur constant regret per round.

Cite

Text

Schneider and Vodrahalli. "Online Learning with Bounded Recall." International Conference on Machine Learning, 2024.

Markdown

[Schneider and Vodrahalli. "Online Learning with Bounded Recall." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/schneider2024icml-online/)

BibTeX

@inproceedings{schneider2024icml-online,
  title     = {{Online Learning with Bounded Recall}},
  author    = {Schneider, Jon and Vodrahalli, Kiran},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {43791-43803},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/schneider2024icml-online/}
}