On the Complexity of Learning from Drifting Distributions
Abstract
We consider two models of on-line learning of binary-valued functions from drifting distributions due to Bartlett. We show that if each example is drawn from a joint distribution which changes in total variation distance by at most O( 3=(d log(1= ))) between trials, then an algorithm can achieve a probability of a mistake at most worse than the best function in a class of VC-dimension d. We prove a corresponding necessary condition of O( 3=d). Finally, in the case that a xed function is to be learned from noise-free examples, we show that if the distributions on the domain generating the examples change by at most O( 2=(d log(1= ))), then any consistent algorithm learns to within accuracy . 3 For a list of the typographical symbols used, please consult Latex, by Leslie Lamport.
Cite
Text
Barve and Long. "On the Complexity of Learning from Drifting Distributions." Annual Conference on Computational Learning Theory, 1996. doi:10.1145/238061.238077Markdown
[Barve and Long. "On the Complexity of Learning from Drifting Distributions." Annual Conference on Computational Learning Theory, 1996.](https://mlanthology.org/colt/1996/barve1996colt-complexity/) doi:10.1145/238061.238077BibTeX
@inproceedings{barve1996colt-complexity,
title = {{On the Complexity of Learning from Drifting Distributions}},
author = {Barve, Rakesh D. and Long, Philip M.},
booktitle = {Annual Conference on Computational Learning Theory},
year = {1996},
pages = {122-130},
doi = {10.1145/238061.238077},
url = {https://mlanthology.org/colt/1996/barve1996colt-complexity/}
}