Bias in Information-Based Measures in Decision Tree Induction
Abstract
A fresh look is taken at the problem of bias in information-based attribute selection measures, used in the induction of decision trees. The approach uses statistical simulation techniques to demonstrate that the usual measures such as information gain, gain ratio, and a new measure recently proposed by Lopez de Mantaras (1991) are all biased in favour of attributes with large numbers of values. It is concluded that approaches which utilise the chi-square distribution are preferable because they compensate automatically for differences between attributes in the number of levels they take.
Cite
Text
White and Liu. "Bias in Information-Based Measures in Decision Tree Induction." Machine Learning, 1994. doi:10.1023/A:1022694010754Markdown
[White and Liu. "Bias in Information-Based Measures in Decision Tree Induction." Machine Learning, 1994.](https://mlanthology.org/mlj/1994/white1994mlj-bias/) doi:10.1023/A:1022694010754BibTeX
@article{white1994mlj-bias,
title = {{Bias in Information-Based Measures in Decision Tree Induction}},
author = {White, Allan P. and Liu, Wei Zhong},
journal = {Machine Learning},
year = {1994},
pages = {321-329},
doi = {10.1023/A:1022694010754},
volume = {15},
url = {https://mlanthology.org/mlj/1994/white1994mlj-bias/}
}