The Convergence Rate of AdaBoost

Abstract

Abstract. We pose the problem of determining the rate of convergence at which AdaBoost minimizes exponential loss. Boosting is the problem of combining many “weak, ” high-error hypotheses to generate a single “strong” hypothesis with very low error. The AdaBoost algorithm of Freund and Schapire (1997) is shown in Figure 1. Here we are given m labeled training examples (x1, y1),..., (xm, ym) where the xi’s are in some domain X, and the labels yi ∈ −1, +1. On each round t, a distribution Dt is computed as in the figure over the m training examples, and a weak hypothesis ht: X → −1, +1 is found, where our aim is to find a weak hypothesis with low weighted error ɛt relative to Dt. In particular, for simplicity, we assume that ht minimizes the weighted error over all hypotheses belonging to some finite class of weak hypotheses H = �1,..., �N. The final hypothesis H computes the sign of a weighted combination of weak hypotheses F (x) = ∑T t=1 αtht(x). Since each ht is equal to �jt for some jt, this can also be rewritten as F (x) = ∑N j=1 λj�j(x) for some set of values λ = 〈λ1,... λN 〉. It was observed by Breiman (1999) and others (Frean & Downs,

Cite

Text

Schapire. "The Convergence Rate of AdaBoost." Annual Conference on Computational Learning Theory, 2010.

Markdown

[Schapire. "The Convergence Rate of AdaBoost." Annual Conference on Computational Learning Theory, 2010.](https://mlanthology.org/colt/2010/schapire2010colt-convergence/)

BibTeX

@inproceedings{schapire2010colt-convergence,
  title     = {{The Convergence Rate of AdaBoost}},
  author    = {Schapire, Robert E.},
  booktitle = {Annual Conference on Computational Learning Theory},
  year      = {2010},
  pages     = {308-309},
  url       = {https://mlanthology.org/colt/2010/schapire2010colt-convergence/}
}