Tracking the Best Expert in Non-Stationary Stochastic Environments

Abstract

We study the dynamic regret of multi-armed bandit and experts problem in non-stationary stochastic environments. We introduce a new parameter $\W$, which measures the total statistical variance of the loss distributions over $T$ rounds of the process, and study how this amount affects the regret. We investigate the interaction between $\W$ and $\Gamma$, which counts the number of times the distributions change, as well as $\W$ and $V$, which measures how far the distributions deviates over time. One striking result we find is that even when $\Gamma$, $V$, and $\Lambda$ are all restricted to constant, the regret lower bound in the bandit setting still grows with $T$. The other highlight is that in the full-information setting, a constant regret becomes achievable with constant $\Gamma$ and $\Lambda$, as it can be made independent of $T$, while with constant $V$ and $\Lambda$, the regret still has a $T^{1/3}$ dependency. We not only propose algorithms with upper bound guarantee, but prove their matching lower bounds as well.

Cite

Text

Wei et al. "Tracking the Best Expert in Non-Stationary Stochastic Environments." Neural Information Processing Systems, 2016.

Markdown

[Wei et al. "Tracking the Best Expert in Non-Stationary Stochastic Environments." Neural Information Processing Systems, 2016.](https://mlanthology.org/neurips/2016/wei2016neurips-tracking/)

BibTeX

@inproceedings{wei2016neurips-tracking,
  title     = {{Tracking the Best Expert in Non-Stationary Stochastic Environments}},
  author    = {Wei, Chen-Yu and Hong, Yi-Te and Lu, Chi-Jen},
  booktitle = {Neural Information Processing Systems},
  year      = {2016},
  pages     = {3972-3980},
  url       = {https://mlanthology.org/neurips/2016/wei2016neurips-tracking/}
}