Minimum Splits Based Discretization for Continuous Features
Abstract
Discretization refers to splitting the range of continuous values into intervals so as to provide useful information about classes. This is usually done by minimizing a goodness measure, subject to constraints such as the maximal number of intervals, the minimal number of examples per interval, or some stopping criterion for splitting. We take a different approach by searching for minimum splits that minimize the number of intervals with respect to a threshold of impurity (i.e., badness). We propose a "total entropy" motivated selection of the "best" split from minimum splits, without requiring additional constraints. Experiments show that the proposed method produces better decision trees. 1 Introduction Continuous values refer to linearly ordered values, mainly numeric values. While continuous values are common in real applications, many learning algorithms focus on unordered discrete values. A common practice is to discretize continuous values into intervals so as t...
Cite
Text
Wang and Goh. "Minimum Splits Based Discretization for Continuous Features." International Joint Conference on Artificial Intelligence, 1997.Markdown
[Wang and Goh. "Minimum Splits Based Discretization for Continuous Features." International Joint Conference on Artificial Intelligence, 1997.](https://mlanthology.org/ijcai/1997/wang1997ijcai-minimum/)BibTeX
@inproceedings{wang1997ijcai-minimum,
title = {{Minimum Splits Based Discretization for Continuous Features}},
author = {Wang, Ke and Goh, Han Chong},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {1997},
pages = {942-951},
url = {https://mlanthology.org/ijcai/1997/wang1997ijcai-minimum/}
}