Supervised and Unsupervised Discretization of Continuous Features

Dougherty, James; Kohavi, Ron; Sahami, Mehran

doi:10.1016/B978-1-55860-377-6.50032-3

Supervised and Unsupervised Discretization of Continuous Features

James Dougherty, Ron Kohavi, Mehran Sahami

ICML 1995 pp. 194-202

doi:10.1016/B978-1-55860-377-6.50032-3 /icml/1995/dougherty1995icml-supervised/

Abstract

Many supervised machine learning algorithms require a discrete feature space. In this paper, we review previous work on continuous feature discretization, identify defining characteristics of the methods, and conduct an empirical evaluation of several methods. We compare binning, an unsupervised discretization method, to entropy-based and purity-based methods, which are supervised algorithms. We found that the performance of the Naive-Bayes algorithm significantly improved when features were discretized using an entropy-based method. In fact, over the 16 tested datasets, the discretized version of Naive-Bayes slightly outperformed C4.5 on average. We also show that in some cases, the performance of the C4.5 induction algorithm significantly improved if features were discretized in advance; in our experiments, the performance never significantly degraded, an interesting phenomenon considering the fact that C4.5 is capable of locally discretizing features.

PDF ICML Semantic Scholar

Cite

Text

Dougherty et al. "Supervised and Unsupervised Discretization of Continuous Features." International Conference on Machine Learning, 1995. doi:10.1016/B978-1-55860-377-6.50032-3

Markdown

[Dougherty et al. "Supervised and Unsupervised Discretization of Continuous Features." International Conference on Machine Learning, 1995.](https://mlanthology.org/icml/1995/dougherty1995icml-supervised/) doi:10.1016/B978-1-55860-377-6.50032-3

BibTeX

@inproceedings{dougherty1995icml-supervised,
  title     = {{Supervised and Unsupervised Discretization of Continuous Features}},
  author    = {Dougherty, James and Kohavi, Ron and Sahami, Mehran},
  booktitle = {International Conference on Machine Learning},
  year      = {1995},
  pages     = {194-202},
  doi       = {10.1016/B978-1-55860-377-6.50032-3},
  url       = {https://mlanthology.org/icml/1995/dougherty1995icml-supervised/}
}