On the Complexity of Learning from Drifting Distributions

Abstract

We consider two models of on-line learning of binary-valued functions from drifting distributions due to Bartlett. We show that if each example is drawn from a joint distribution which changes in total variation distance by at most O( 3=(d log(1= ))) between trials, then an algorithm can achieve a probability of a mistake at most worse than the best function in a class of VC-dimension d. We prove a corresponding necessary condition of O( 3=d). Finally, in the case that a xed function is to be learned from noise-free examples, we show that if the distributions on the domain generating the examples change by at most O( 2=(d log(1= ))), then any consistent algorithm learns to within accuracy . 3 For a list of the typographical symbols used, please consult Latex, by Leslie Lamport.

Cite

Text

Barve and Long. "On the Complexity of Learning from Drifting Distributions." Annual Conference on Computational Learning Theory, 1996. doi:10.1145/238061.238077

Markdown

[Barve and Long. "On the Complexity of Learning from Drifting Distributions." Annual Conference on Computational Learning Theory, 1996.](https://mlanthology.org/colt/1996/barve1996colt-complexity/) doi:10.1145/238061.238077

BibTeX

@inproceedings{barve1996colt-complexity,
  title     = {{On the Complexity of Learning from Drifting Distributions}},
  author    = {Barve, Rakesh D. and Long, Philip M.},
  booktitle = {Annual Conference on Computational Learning Theory},
  year      = {1996},
  pages     = {122-130},
  doi       = {10.1145/238061.238077},
  url       = {https://mlanthology.org/colt/1996/barve1996colt-complexity/}
}