A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs Work

Abstract

We present a bound on the generalisation error of linear classifiers in terms of a refined margin quantity on the training set. The result is obtained in a PAC- Bayesian framework and is based on geometrical arguments in the space of linear classifiers. The new bound constitutes an exponential improvement of the so far tightest margin bound by Shawe-Taylor et al. [8] and scales logarithmically in the inverse margin. Even in the case of less training examples than input dimensions sufficiently large margins lead to non-trivial bound values and - plexity term. Furthermore, the classical margin is too coarse a measure for the essential quantity that controls the generalisation error: the volume ratio between the whole hypothesis space and the subset of consistent hypotheses. The practical relevance of the result lies in the fact that the well-known support vector machine is optimal w.r.t. the new bound only if the feature vectors are all of the same length. As a consequence we recommend to use SVMs on normalised feature vectors only - a recommendation that is well supported by our numerical experiments on two benchmark data sets.

Cite

Text

Herbrich and Graepel. "A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs Work." Neural Information Processing Systems, 2000.

Markdown

[Herbrich and Graepel. "A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs Work." Neural Information Processing Systems, 2000.](https://mlanthology.org/neurips/2000/herbrich2000neurips-pacbayesian/)

BibTeX

@inproceedings{herbrich2000neurips-pacbayesian,
  title     = {{A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs Work}},
  author    = {Herbrich, Ralf and Graepel, Thore},
  booktitle = {Neural Information Processing Systems},
  year      = {2000},
  pages     = {224-230},
  url       = {https://mlanthology.org/neurips/2000/herbrich2000neurips-pacbayesian/}
}