Learning Linear Threshold Functions in the Presence of Classification Noise

Abstract

I show that the linear threshold functions are polynomially learnable in the presence of classification noise, i.e., polynomial in n, 1/ε, 1/δ, and 1/σ, where n is the number of Boolean attributes, ε and δ are the usual accuracy and confidence parameters, and σ indicates the minimum distance of any example from the target hyperplane, which is assumed to be lower than the average distance of the examples from any hyperplane. This result is achieved by modifying the Perceptron algorithm—for each update, a weighted average of a sample of misclassified examples and a correction vector is added to the current weight vector. Similar modifications are shown for the Weighted Majority algorithm. The correction vector is simply the mean of the normalized examples. In the special case of Boolean threshold functions, the modified Perceptron algorithm performs O (n2ε−2 ) iterations over O(n4ε −2ln(n/(δε))) examples. This improves on the previous classification-noise result of Angluin and Laird to a much larger concept class with a similar number of examples, but with multiple iterations over the examples.

Cite

Text

Bylander. "Learning Linear Threshold Functions in the Presence of Classification Noise." Annual Conference on Computational Learning Theory, 1994. doi:10.1145/180139.181176

Markdown

[Bylander. "Learning Linear Threshold Functions in the Presence of Classification Noise." Annual Conference on Computational Learning Theory, 1994.](https://mlanthology.org/colt/1994/bylander1994colt-learning/) doi:10.1145/180139.181176

BibTeX

@inproceedings{bylander1994colt-learning,
  title     = {{Learning Linear Threshold Functions in the Presence of Classification Noise}},
  author    = {Bylander, Tom},
  booktitle = {Annual Conference on Computational Learning Theory},
  year      = {1994},
  pages     = {340-347},
  doi       = {10.1145/180139.181176},
  url       = {https://mlanthology.org/colt/1994/bylander1994colt-learning/}
}