Improved Second-Order Bounds for Prediction with Expert Advice

Abstract

This work studies external regret in sequential prediction games with arbitrary payoffs (nonnegative or non-positive). External regret measures the difference between the payoff obtained by the forecasting strategy and the payoff of the best action. We focus on two important parameters: M , the largest absolute value of any payoff, and Q ^*, the sum of squared payoffs of the best action. Given these parameters we derive first a simple and new forecasting strategy with regret at most order of $\sqrt{Q^{*}({\rm ln}N)}+M {\rm ln} N$ , where N is the number of actions. We extend the results to the case where the parameters are unknown and derive similar bounds. We then devise a refined analysis of the weighted majority forecaster, which yields bounds of the same flavour. The proof techniques we develop are finally applied to the adversarial multi-armed bandit setting, and we prove bounds on the performance of an online algorithm in the case where there is no lower bound on the probability of each action.

Cite

Text

Cesa-Bianchi et al. "Improved Second-Order Bounds for Prediction with Expert Advice." Annual Conference on Computational Learning Theory, 2005. doi:10.1007/11503415_15

Markdown

[Cesa-Bianchi et al. "Improved Second-Order Bounds for Prediction with Expert Advice." Annual Conference on Computational Learning Theory, 2005.](https://mlanthology.org/colt/2005/cesabianchi2005colt-improved/) doi:10.1007/11503415_15

BibTeX

@inproceedings{cesabianchi2005colt-improved,
  title     = {{Improved Second-Order Bounds for Prediction with Expert Advice}},
  author    = {Cesa-Bianchi, Nicolò and Mansour, Yishay and Stoltz, Gilles},
  booktitle = {Annual Conference on Computational Learning Theory},
  year      = {2005},
  pages     = {217-232},
  doi       = {10.1007/11503415_15},
  url       = {https://mlanthology.org/colt/2005/cesabianchi2005colt-improved/}
}