On the Sample Complexity of Weak Learning
Abstract
Abstract In this paper, we study the sample complexity of weak learning. Thus, we ask how much data must be collected from an unknown distribution in order to extract a small but significant advantage in prediction. Such a study is motivated in part by settings in which there is a limited supply of examples. We show that it is important to distinguish between those learning algorithms that output deterministic hypotheses and those that output randomized hypotheses; this is distinct from the learning algorithm itself being randomized, which we always assume may be the case. We prove that in the weak learning model, any algorithm using deterministic hypotheses to weakly learn a class of Vapnik-Chervonenkis dimension d(n) requires Ω ( d ( n ) ) examples. In contrast, when randomized hypotheses are allowed, we give an efficient algorithm that weakly learns against any distribution on a set of size d(n) using Θ(1) examples. (Here and below we assume that d(n) is polynomial in n.) Using a general and simple sampling technique for converting any weak learning algorithm using randomized hypotheses into one using deterministic hypotheses, we show that there exists an efficient algorithm using deterministic hypotheses that weakly learns against any distribution on a set of size d(n) with only O(d(n)2/3) examples. We apply these results to the class of symmetric Boolean functions over n variables, where the strong learning sample complexity is Θ(n), the sample complexity for weak learning using deterministic hypotheses is Ω ( n ) and O(n 2/3), and the sample complexity for weak learning using randomized hypotheses is Θ(1).These results show that the sample complexity for weak learning may be considerably smaller than for strong learning, and that the power of using randomized hypotheses for weak learning may be dramatic. Next we prove the existence of classes Cn over 0, 1 n whose Vapnik-Chervonenkis dimension is Θ(n) and whose weak learning sample complexity is Θ(n) (regardless of whether the hypotheses used are deterministic or randomized). Thus, there are classes for which the distribution-free sample size required to obtain a slight advantage in prediction over random guessing is essentially equal to that required to obtain arbitrary accuracy. Finally, for a class of small circuits, namely all parity functions of subsets of n Boolean variables, we prove a weak learning sample complexity of Θ(n). This bound holds even if the weak learning algorithm is allowed to replace random sampling with membership queries, and the target distribution is uniform on 0, 1 n .
Cite
Text
Goldman et al. "On the Sample Complexity of Weak Learning." Annual Conference on Computational Learning Theory, 1990. doi:10.1016/B978-1-55860-146-8.50020-5Markdown
[Goldman et al. "On the Sample Complexity of Weak Learning." Annual Conference on Computational Learning Theory, 1990.](https://mlanthology.org/colt/1990/goldman1990colt-sample/) doi:10.1016/B978-1-55860-146-8.50020-5BibTeX
@inproceedings{goldman1990colt-sample,
title = {{On the Sample Complexity of Weak Learning}},
author = {Goldman, Sally A. and Kearns, Michael J. and Schapire, Robert E.},
booktitle = {Annual Conference on Computational Learning Theory},
year = {1990},
pages = {217-231},
doi = {10.1016/B978-1-55860-146-8.50020-5},
url = {https://mlanthology.org/colt/1990/goldman1990colt-sample/}
}