Pasting Small Votes for Classification in Large Databases and On-Line

Abstract

Many databases have grown to the point where they cannot fit into the fast memory of even large memory machines, to say nothing of current workstations. If what we want to do is to use these data bases to construct predictions of various characteristics, then since the usual methods require that all data be held in fast memory, various work-arounds have to be used. This paper studies one such class of methods which give accuracy comparable to that which could have been obtained if all data could have been held in core and which are computationally fast. The procedure takes small pieces of the data, grows a predictor on each small piece and then pastes these predictors together. A version is given that scales up to terabyte data sets. The methods are also applicable to on-line learning.

Cite

Text

Breiman. "Pasting Small Votes for Classification in Large Databases and On-Line." Machine Learning, 1999. doi:10.1023/A:1007563306331

Markdown

[Breiman. "Pasting Small Votes for Classification in Large Databases and On-Line." Machine Learning, 1999.](https://mlanthology.org/mlj/1999/breiman1999mlj-pasting/) doi:10.1023/A:1007563306331

BibTeX

@article{breiman1999mlj-pasting,
  title     = {{Pasting Small Votes for Classification in Large Databases and On-Line}},
  author    = {Breiman, Leo},
  journal   = {Machine Learning},
  year      = {1999},
  pages     = {85-103},
  doi       = {10.1023/A:1007563306331},
  volume    = {36},
  url       = {https://mlanthology.org/mlj/1999/breiman1999mlj-pasting/}
}