Bias in Information-Based Measures in Decision Tree Induction

Abstract

A fresh look is taken at the problem of bias in information-based attribute selection measures, used in the induction of decision trees. The approach uses statistical simulation techniques to demonstrate that the usual measures such as information gain, gain ratio, and a new measure recently proposed by Lopez de Mantaras (1991) are all biased in favour of attributes with large numbers of values. It is concluded that approaches which utilise the chi-square distribution are preferable because they compensate automatically for differences between attributes in the number of levels they take.

Cite

Text

White and Liu. "Bias in Information-Based Measures in Decision Tree Induction." Machine Learning, 1994. doi:10.1023/A:1022694010754

Markdown

[White and Liu. "Bias in Information-Based Measures in Decision Tree Induction." Machine Learning, 1994.](https://mlanthology.org/mlj/1994/white1994mlj-bias/) doi:10.1023/A:1022694010754

BibTeX

@article{white1994mlj-bias,
  title     = {{Bias in Information-Based Measures in Decision Tree Induction}},
  author    = {White, Allan P. and Liu, Wei Zhong},
  journal   = {Machine Learning},
  year      = {1994},
  pages     = {321-329},
  doi       = {10.1023/A:1022694010754},
  volume    = {15},
  url       = {https://mlanthology.org/mlj/1994/white1994mlj-bias/}
}