Expected Error Analysis for Model Selection
Abstract
In order to select a good hypothesis language (or model) from a collection of possible models, one has to assess the generalization performance of the hypothesis which is returned by a learner that is bound to use some particular model. This paper deals with a new and very efficient way of assessing this generalization performance. We present a new analysis which characterizes the expected generalization error of the hypothesis with least training error in terms of the distribution of error rates of the hypotheses in the model. This distribution can be estimated very efficiently from the data which immediately leads to an efficient model selection algorithm. The analysis predicts learning curves with a very high precision and thus contributes to a better understanding of why and when over-fitting occurs. We present empirical studies (controlled experiments on Boolean decision trees and a large-scale text categorization problem) which show that the model selection algorithm leads to err...
Cite
Text
Scheffer and Joachims. "Expected Error Analysis for Model Selection." International Conference on Machine Learning, 1999.Markdown
[Scheffer and Joachims. "Expected Error Analysis for Model Selection." International Conference on Machine Learning, 1999.](https://mlanthology.org/icml/1999/scheffer1999icml-expected/)BibTeX
@inproceedings{scheffer1999icml-expected,
title = {{Expected Error Analysis for Model Selection}},
author = {Scheffer, Tobias and Joachims, Thorsten},
booktitle = {International Conference on Machine Learning},
year = {1999},
pages = {361-370},
url = {https://mlanthology.org/icml/1999/scheffer1999icml-expected/}
}