Optimum Follow the Leader Algorithm
Abstract
Consider the following setting for an on-line algorithm (introduced in [FS97]) that learns from a set of experts: In trial t the algorithm chooses an expert with probability p $^{t}_{i}$ . At the end of the trial a loss vector L ^ t ∈ [0, R ]^ n for the n experts is received and an expected loss of ∑ _ i p $^{t}_{i}$ L $^{t}_{i}$ is incurred. A simple algorithm for this setting is the Hedge algorithm which uses the probabilities $p^{t}_{i} \sim exp^{-\eta L^{<t}_{i}}$ . This algorithm and its analysis is a simple reformulation of the randomized version of the Weighted Majority algorithm (WMR) [LW94] which was designed for the absolute loss. The total expected loss of the algorithm is close to the total loss of the best expert $L_{*} = min_{i}L^{\leq T}_{i}$ . That is, when the learning rate is optimally tuned based on L _*, R and n , then the total expected loss of the Hedge/WMR algorithm is at most $L_{*} + \sqrt{\bf 2}\sqrt{L_{*}R{\rm log} n} + O({\rm log} n)$ The factor of $\sqrt{\bf 2}$ is in some sense optimal [Vov97].
Cite
Text
Kuzmin and Warmuth. "Optimum Follow the Leader Algorithm." Annual Conference on Computational Learning Theory, 2005. doi:10.1007/11503415_46Markdown
[Kuzmin and Warmuth. "Optimum Follow the Leader Algorithm." Annual Conference on Computational Learning Theory, 2005.](https://mlanthology.org/colt/2005/kuzmin2005colt-optimum/) doi:10.1007/11503415_46BibTeX
@inproceedings{kuzmin2005colt-optimum,
title = {{Optimum Follow the Leader Algorithm}},
author = {Kuzmin, Dima and Warmuth, Manfred K.},
booktitle = {Annual Conference on Computational Learning Theory},
year = {2005},
pages = {684-686},
doi = {10.1007/11503415_46},
url = {https://mlanthology.org/colt/2005/kuzmin2005colt-optimum/}
}