Online Ensemble Learning

Abstract

Ensemble learning methods train combinations of base models, which may be decision trees, neural networks, or others traditionally used in supervised learning. Ensemble methods have gained popularity because many researchers have demonstrated their superior prediction performance relative to single models on a variety of problems especially when the correlations of the errors made by the base models are low (e.g., (Freund & Schapire 1996; Tumer & Oza 1999)). However, these learning methods have largely operated in batch mode—that is, they repeatedly process the entire set of training examples as a whole. These methods typically require at least one pass through the data for each base model in the ensemble. We would instead prefer to learn the entire ensemble in an online fashion, i.e., using only one pass through the entire dataset. This would make ensemble methods practical when data is being generated continuously so that storing data for batch learning is impractical, or in data mining tasks where the datasets are large enough that multiple passes would require a prohibitively large training time. We have so far developed online versions of the popular bagging (Breiman 1994) and boosting (Freund & Schapire 1996) algorithms. We have shown empirically that both online algorithms converge to the same prediction performance as the batch versions and proved this convergence for online bagging (Oza 2000). However, significant empirical and theoretical work remains to be done. There are several traditional ensemble learning issues that remain in our online ensemble learning framework such as the number and types of base models to use, the combining method to use, and how to maintain diversity among the base models. When learning large datasets, we may hope to avoid using all of the training examples and/or input features. We have developed input decimation (Tumer & Oza 1999), a technique that uses different subsets of the input features in different base models. We have shown that this method performs better than combinations of base models that use all the input features because of two characteristics of our base models: they overfit less by using only a small number of highly-relevant input features, and they have lower correlations in their errors because they use different input feature subsets. However, our method of selecting input features

Cite

Text

Oza. "Online Ensemble Learning." AAAI Conference on Artificial Intelligence, 2000.

Markdown

[Oza. "Online Ensemble Learning." AAAI Conference on Artificial Intelligence, 2000.](https://mlanthology.org/aaai/2000/oza2000aaai-online/)

BibTeX

@inproceedings{oza2000aaai-online,
  title     = {{Online Ensemble Learning}},
  author    = {Oza, Nikunj C.},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2000},
  pages     = {1109},
  url       = {https://mlanthology.org/aaai/2000/oza2000aaai-online/}
}