Greedy Attribute Selection

Abstract

Many real-world domains bless us with a wealth of attributes to use for learning. This blessing is often a curse: most inductive methods generalize worse given too many attributes than if given a good subset of those attributes. We examine this problem for two learning tasks taken from a calendar scheduling domain. We show that ID3/C4.5 generalizes poorly on these tasks if allowed to use all available attributes. We examine five greedy hillclimbing procedures that search for attribute sets that generalize well with ID3/C4.5. Experiments suggest hillclimbing in attribute space can yield substantial improvements in generalization performance. We present a caching scheme that makes attribute hillclimbing more practical computationally. We also compare the results of hillclimbing in attribute space with FOCUS and RELIEF on the two tasks.

Cite

Text

Caruana and Freitag. "Greedy Attribute Selection." International Conference on Machine Learning, 1994. doi:10.1016/B978-1-55860-335-6.50012-X

Markdown

[Caruana and Freitag. "Greedy Attribute Selection." International Conference on Machine Learning, 1994.](https://mlanthology.org/icml/1994/caruana1994icml-greedy/) doi:10.1016/B978-1-55860-335-6.50012-X

BibTeX

@inproceedings{caruana1994icml-greedy,
  title     = {{Greedy Attribute Selection}},
  author    = {Caruana, Rich and Freitag, Dayne},
  booktitle = {International Conference on Machine Learning},
  year      = {1994},
  pages     = {28-36},
  doi       = {10.1016/B978-1-55860-335-6.50012-X},
  url       = {https://mlanthology.org/icml/1994/caruana1994icml-greedy/}
}