Feature Weighting in K-Means Clustering

Abstract

Data sets with multiple, heterogeneous feature spaces occur frequently. We present an abstract framework for integrating multiple feature spaces in the k -means clustering algorithm. Our main ideas are (i) to represent each data object as a tuple of multiple feature vectors, (ii) to assign a suitable (and possibly different) distortion measure to each feature space, (iii) to combine distortions on different feature spaces, in a convex fashion, by assigning (possibly) different relative weights to each, (iv) for a fixed weighting, to cluster using the proposed convex k-means algorithm , and (v) to determine the optimal feature weighting to be the one that yields the clustering that simultaneously minimizes the average within-cluster dispersion and maximizes the average between-cluster dispersion along all the feature spaces. Using precision/recall evaluations and known ground truth classifications, we empirically demonstrate the effectiveness of feature weighting in clustering on several different application domains.

Cite

Text

Modha and Spangler. "Feature Weighting in K-Means Clustering." Machine Learning, 2003. doi:10.1023/A:1024016609528

Markdown

[Modha and Spangler. "Feature Weighting in K-Means Clustering." Machine Learning, 2003.](https://mlanthology.org/mlj/2003/modha2003mlj-feature/) doi:10.1023/A:1024016609528

BibTeX

@article{modha2003mlj-feature,
  title     = {{Feature Weighting in K-Means Clustering}},
  author    = {Modha, Dharmendra S. and Spangler, W. Scott},
  journal   = {Machine Learning},
  year      = {2003},
  pages     = {217-237},
  doi       = {10.1023/A:1024016609528},
  volume    = {52},
  url       = {https://mlanthology.org/mlj/2003/modha2003mlj-feature/}
}