Empirical Risk Minimization Versus Maximum-Likelihood Estimation: A Case Study

Abstract

We study the interaction between input distributions, learning algorithms, and finite sample sizes in the case of learning classification tasks. Focusing on the case of normal input distributions, we use statistical mechanics techniques to calculate the empirical and expected (or generalization) errors for several well-known algorithms learning the weights of a single-layer perceptron. In the case of spherically symmetric distributions within each class we find that the simple Hebb rule, corresponding to maximum-likelihood parameter estimation, outperforms the other more complex algorithms, based on error minimization. Moreover, we show that in the regime where the overlap between the classes is large, algorithms with low empirical error do worse in terms of generalization, a phenomenon known as overtraining.

Cite

Text

Meir. "Empirical Risk Minimization Versus Maximum-Likelihood Estimation: A Case Study." Neural Computation, 1995. doi:10.1162/NECO.1995.7.1.144

Markdown

[Meir. "Empirical Risk Minimization Versus Maximum-Likelihood Estimation: A Case Study." Neural Computation, 1995.](https://mlanthology.org/neco/1995/meir1995neco-empirical/) doi:10.1162/NECO.1995.7.1.144

BibTeX

@article{meir1995neco-empirical,
  title     = {{Empirical Risk Minimization Versus Maximum-Likelihood Estimation: A Case Study}},
  author    = {Meir, Ronny},
  journal   = {Neural Computation},
  year      = {1995},
  pages     = {144-157},
  doi       = {10.1162/NECO.1995.7.1.144},
  volume    = {7},
  url       = {https://mlanthology.org/neco/1995/meir1995neco-empirical/}
}