Proportional K-Interval Discretization for Naive-Bayes Classifiers

Abstract

This paper argues that two commonly-used discretization approaches, fixed k-interval discretization and entropy-based discretization have sub-optimal characteristics for naive-Bayes classification. This analysis leads to a new discretization method, Proportional k-Interval Discretization (PKID), which adjusts the number and size of discretized intervals to the number of training instances, thus seeks an appropriate trade-off between the bias and variance of the probability estimation for naive-Bayes classifiers. We justify PKID in theory, as well as test it on a wide cross-section of datasets. Our experimental results suggest that in comparison to its alternatives, PKID provides naive-Bayes classifiers competitive classification performance for smaller datasets and better classification performance for larger datasets.

Cite

Text

Yang and Webb. "Proportional K-Interval Discretization for Naive-Bayes Classifiers." European Conference on Machine Learning, 2001. doi:10.1007/3-540-44795-4_48

Markdown

[Yang and Webb. "Proportional K-Interval Discretization for Naive-Bayes Classifiers." European Conference on Machine Learning, 2001.](https://mlanthology.org/ecmlpkdd/2001/yang2001ecml-proportional/) doi:10.1007/3-540-44795-4_48

BibTeX

@inproceedings{yang2001ecml-proportional,
  title     = {{Proportional K-Interval Discretization for Naive-Bayes Classifiers}},
  author    = {Yang, Ying and Webb, Geoffrey I.},
  booktitle = {European Conference on Machine Learning},
  year      = {2001},
  pages     = {564-575},
  doi       = {10.1007/3-540-44795-4_48},
  url       = {https://mlanthology.org/ecmlpkdd/2001/yang2001ecml-proportional/}
}