Compression-Based Discretization of Continuous Attributes

Abstract

Discretization of continuous attributes into ordered discrete attributes can be beneficial even for propositional induction algorithms that are capable of handling continuous attributes directly. Benefits include possibly large improvements in induction time, smaller sizes of induced trees or rule sets, and even improved predictive accuracy. We define a global evaluation measure for discretizations based on the so-called Minimum Description Length (MDL) principle from information theory. Furthermore we describe the efficient algorithmic usage of this measure in the MDL-Disc algorithm. The new method solves some problems of alternative local measures used for discretization. Empirical results in a few natural domains and extensive experiments in an artificial domain show that MDL-Disc scales up well to large learning problems involving noise.

Cite

Text

Pfahringer. "Compression-Based Discretization of Continuous Attributes." International Conference on Machine Learning, 1995. doi:10.1016/B978-1-55860-377-6.50063-3

Markdown

[Pfahringer. "Compression-Based Discretization of Continuous Attributes." International Conference on Machine Learning, 1995.](https://mlanthology.org/icml/1995/pfahringer1995icml-compression/) doi:10.1016/B978-1-55860-377-6.50063-3

BibTeX

@inproceedings{pfahringer1995icml-compression,
  title     = {{Compression-Based Discretization of Continuous Attributes}},
  author    = {Pfahringer, Bernhard},
  booktitle = {International Conference on Machine Learning},
  year      = {1995},
  pages     = {456-463},
  doi       = {10.1016/B978-1-55860-377-6.50063-3},
  url       = {https://mlanthology.org/icml/1995/pfahringer1995icml-compression/}
}