Linkage and Autocorrelation Cause Feature Selection Bias in Relational Learning

Abstract

Two common characteristics of relational data sets — concentrated linkage and relational autocorrelation — can cause learning algorithms to be strongly biased toward certain features, irrespective of their predictive power. We identify these characteristics, define quantitative measures of their severity, and explain how they produce this bias. We show how linkage and autocorrelation affect a representative algorithm for feature selection by applying the algorithm to synthetic data and to data drawn from the Internet Movie Database. 1.1 Relational Data and Statistical Dependence Figure 1 presents two simple relational data sets. In each

Cite

Text

Jensen and Neville. "Linkage and Autocorrelation Cause Feature Selection Bias in Relational Learning." International Conference on Machine Learning, 2002.

Markdown

[Jensen and Neville. "Linkage and Autocorrelation Cause Feature Selection Bias in Relational Learning." International Conference on Machine Learning, 2002.](https://mlanthology.org/icml/2002/jensen2002icml-linkage/)

BibTeX

@inproceedings{jensen2002icml-linkage,
  title     = {{Linkage and Autocorrelation Cause Feature Selection Bias in Relational Learning}},
  author    = {Jensen, David D. and Neville, Jennifer},
  booktitle = {International Conference on Machine Learning},
  year      = {2002},
  pages     = {259-266},
  url       = {https://mlanthology.org/icml/2002/jensen2002icml-linkage/}
}