The Rate of Convergence of Adaboost
Abstract
The AdaBoost algorithm of Freund and Schapire (1997) was designed to combine many “weak” hypotheses that perform slightly better than a random guess into a “strong” hypothesis that has very low error. We study the rate at which AdaBoost iteratively converges to the minimum of the “exponential loss” with a fast rate of convergence. Our proofs do not require a weak-learning assumption, nor do they require that minimizers of the exponential loss are finite. Specifically, our first result shows that at iteration $t$, the exponential loss of AdaBoost’s computed parameter vector will be at most $\varepsilon$ more than that of any parameter vector of $\ell_1$-norm bounded by $B$ in a number of rounds that is bounded by a polynomial in $B$ and $1/\varepsilon$. We also provide rate lower bound examples showing a polynomial dependence on these parameters is necessary. Our second result is that within $C/\varepsilon$ iterations, AdaBoost achieves a value of the exponential loss that is at most $\varepsilon$ more than the best possible value, where $C$ depends on the dataset. We show that this dependence of the rate on $\varepsilon$ is optimal up to constant factors, i.e. at least $\Omega(1/\varepsilon)$ rounds are necessary to achieve within $\varepsilon$ of the optimal exponential loss.
Cite
Text
Mukherjee et al. "The Rate of Convergence of Adaboost." Proceedings of the 24th Annual Conference on Learning Theory, 2011.Markdown
[Mukherjee et al. "The Rate of Convergence of Adaboost." Proceedings of the 24th Annual Conference on Learning Theory, 2011.](https://mlanthology.org/colt/2011/mukherjee2011colt-rate/)BibTeX
@inproceedings{mukherjee2011colt-rate,
title = {{The Rate of Convergence of Adaboost}},
author = {Mukherjee, Indraneel and Rudin, Cynthia and Schapire, Robert E.},
booktitle = {Proceedings of the 24th Annual Conference on Learning Theory},
year = {2011},
pages = {537-558},
volume = {19},
url = {https://mlanthology.org/colt/2011/mukherjee2011colt-rate/}
}