Proportional K-Interval Discretization for Naive-Bayes Classifiers
Abstract
This paper argues that two commonly-used discretization approaches, fixed k-interval discretization and entropy-based discretization have sub-optimal characteristics for naive-Bayes classification. This analysis leads to a new discretization method, Proportional k-Interval Discretization (PKID), which adjusts the number and size of discretized intervals to the number of training instances, thus seeks an appropriate trade-off between the bias and variance of the probability estimation for naive-Bayes classifiers. We justify PKID in theory, as well as test it on a wide cross-section of datasets. Our experimental results suggest that in comparison to its alternatives, PKID provides naive-Bayes classifiers competitive classification performance for smaller datasets and better classification performance for larger datasets.
Cite
Text
Yang and Webb. "Proportional K-Interval Discretization for Naive-Bayes Classifiers." European Conference on Machine Learning, 2001. doi:10.1007/3-540-44795-4_48Markdown
[Yang and Webb. "Proportional K-Interval Discretization for Naive-Bayes Classifiers." European Conference on Machine Learning, 2001.](https://mlanthology.org/ecmlpkdd/2001/yang2001ecml-proportional/) doi:10.1007/3-540-44795-4_48BibTeX
@inproceedings{yang2001ecml-proportional,
title = {{Proportional K-Interval Discretization for Naive-Bayes Classifiers}},
author = {Yang, Ying and Webb, Geoffrey I.},
booktitle = {European Conference on Machine Learning},
year = {2001},
pages = {564-575},
doi = {10.1007/3-540-44795-4_48},
url = {https://mlanthology.org/ecmlpkdd/2001/yang2001ecml-proportional/}
}