Feature Weighting in K-Means Clustering

Modha, Dharmendra S.; Spangler, W. Scott

doi:10.1023/A:1024016609528

Feature Weighting in K-Means Clustering

Dharmendra S. Modha, W. Scott Spangler

MLJ 2003 pp. 217-237

doi:10.1023/A:1024016609528 /mlj/2003/modha2003mlj-feature/

Abstract

Data sets with multiple, heterogeneous feature spaces occur frequently. We present an abstract framework for integrating multiple feature spaces in the k -means clustering algorithm. Our main ideas are (i) to represent each data object as a tuple of multiple feature vectors, (ii) to assign a suitable (and possibly different) distortion measure to each feature space, (iii) to combine distortions on different feature spaces, in a convex fashion, by assigning (possibly) different relative weights to each, (iv) for a fixed weighting, to cluster using the proposed convex k-means algorithm , and (v) to determine the optimal feature weighting to be the one that yields the clustering that simultaneously minimizes the average within-cluster dispersion and maximizes the average between-cluster dispersion along all the feature spaces. Using precision/recall evaluations and known ground truth classifications, we empirically demonstrate the effectiveness of feature weighting in clustering on several different application domains.

PDF MLJ Semantic Scholar

Cite

Text

Modha and Spangler. "Feature Weighting in K-Means Clustering." Machine Learning, 2003. doi:10.1023/A:1024016609528

Markdown

[Modha and Spangler. "Feature Weighting in K-Means Clustering." Machine Learning, 2003.](https://mlanthology.org/mlj/2003/modha2003mlj-feature/) doi:10.1023/A:1024016609528

BibTeX

@article{modha2003mlj-feature,
  title     = {{Feature Weighting in K-Means Clustering}},
  author    = {Modha, Dharmendra S. and Spangler, W. Scott},
  journal   = {Machine Learning},
  year      = {2003},
  pages     = {217-237},
  doi       = {10.1023/A:1024016609528},
  volume    = {52},
  url       = {https://mlanthology.org/mlj/2003/modha2003mlj-feature/}
}