Expected Error Analysis for Model Selection

Scheffer, Tobias; Joachims, Thorsten

Expected Error Analysis for Model Selection

ICML 1999 pp. 361-370

/icml/1999/scheffer1999icml-expected/

Abstract

In order to select a good hypothesis language (or model) from a collection of possible models, one has to assess the generalization performance of the hypothesis which is returned by a learner that is bound to use some particular model. This paper deals with a new and very efficient way of assessing this generalization performance. We present a new analysis which characterizes the expected generalization error of the hypothesis with least training error in terms of the distribution of error rates of the hypotheses in the model. This distribution can be estimated very efficiently from the data which immediately leads to an efficient model selection algorithm. The analysis predicts learning curves with a very high precision and thus contributes to a better understanding of why and when over-fitting occurs. We present empirical studies (controlled experiments on Boolean decision trees and a large-scale text categorization problem) which show that the model selection algorithm leads to err...

Semantic Scholar

Cite

Text

Scheffer and Joachims. "Expected Error Analysis for Model Selection." International Conference on Machine Learning, 1999.

Markdown

[Scheffer and Joachims. "Expected Error Analysis for Model Selection." International Conference on Machine Learning, 1999.](https://mlanthology.org/icml/1999/scheffer1999icml-expected/)

BibTeX

@inproceedings{scheffer1999icml-expected,
  title     = {{Expected Error Analysis for Model Selection}},
  author    = {Scheffer, Tobias and Joachims, Thorsten},
  booktitle = {International Conference on Machine Learning},
  year      = {1999},
  pages     = {361-370},
  url       = {https://mlanthology.org/icml/1999/scheffer1999icml-expected/}
}